Indian Agricultural 
Research Institute, New Delhi 


I.A.R 1.6. 

GIT MT,tC—FT 3 I.A.R.I.- 10-3 S3—15.000 




THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 

(founded by h c carver) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XVIII 


1947 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


M. S. BARTLETT 
WILLIAM G. COCHRAN 
ALLEN T. CRAIG 
C. C. CRAIG 


T. W. Anderson, Jr. 
J. H. Curtiss 
J. F. Daly 
Harold F. Dodqe 
Paul S. Dwyer 


EDITED BY 
S. S. WILKS, Editor 

HARALD CRAMfiR 
W. EDWARDS DEMING 
J. L. DOOB 
W. FELLER 
HAROLD HOTELLING 

with the co Operation of 

Churchill Eisenhart 
M. A. Girshick 
Paul R. Halmos 
Paul G. Hoel 
Mark Kac 


J. NEYMAN 

WALTER A. SHEWHART 
JOHN W. TUKEY 
A. WALD 


William G. Madow 
Alexander M. Mood 
Frederick Mosteller 
Henry ScheffA 
Jacob Wolfowitz 


The Annals of Mathematical Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 
Md. Subscriptions, renewals, orders for back numbers and other business com¬ 
munications should be sent to the Annals of Mathematical Statistics, Mt. 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the Insti¬ 
tute of Mathematical Statistics, P. S. Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 

Changes in mailing address which are to become effective for a given 
issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. 

Manuscripts for publication in the Annals of Mathematical Statistics 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot¬ 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 

Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 

The subscription price for the Annals is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 

Composed and Printed at the 
WAVERLY PRESS, Inc. 

Baltimore# Md., U. S. A. 


Entered u aeeond-daea matter at the Poet Offioe at Baltimore. Maryland, under the Act of March 3,1870 




CONTENTS OF VOLUME XVIH 

Ariicles 

Aroian, Leo A The Probability Function of the Product of Two Normally 
Distributed Variables 265 

Bartlett, M S The General Canonical Correlation Distribution 1 

Birnbaum, Z W , Raymond, J , and Zuckerman, H S A Generalization 
of Tshebyshev’s Inequality to Two Dimensions 70 

Blackwell, David Conditional Expectation and Unbiased Sequential 
Estimation 105 

Brown, G* orgl W Discriminant 1 unc tions 514 

Craig, Allen T Bilineai Forms in Normally Correlated Variables 565 

Cramer, Harald Problems m Probability Theory 165 

Grubbs, Frank E , and Mors*, Anihony P The Estimation oi Disper¬ 
sion from Differences 194 

Gumbel, E J The Distribution of the Range 384 

Hastings, Cecil, Jr , Mostellfu, Frederick, Tukey, John W, and 
Winsor, Charles P Low Moments for Small Samples A Compara¬ 
tive Study of Ordei Statistics 413 

Hole, Paul G Discriminating Between Bmomial Distributions 55b 

Kimb vll, Bradford F Some Basic Theorems for Developing Tests of T it 
for the Case of the Non Parametric Probability Distribution Function I 540 
Kozakiewicz, W On the Convergence of Sequences of Moment Gener¬ 
ating Functions 61 

Kullbai k, S On the Charlier lype B Series 574 

Lehmann, E L On Families of Admissible Tests 97 

Lfhm\nn, E L On Optimum Tests of Composite Hypotheses with One 
Constraint 473 

Leipnik, R B Distribution of the Sei lal Correlation in a Circularly Cor¬ 
related Um verse 80 

Malmquist, Sten A Statistical Problem Connected with the Counting of 
Radioactive Particles 255 

Mann, H B , and Whitney, D R On a Test of Whether One of Two 
Random Variables is Stochastically Larger than the Other 50 

McCarthy, Philip J Approximate Solutions for Means and Variances m 
a Certain Class of Box Problems 349 

Mises, R v On the Asymptotic Distribution of Differentiable Statistical 
Functions 309 

Montroll, Elliott W On the Theory of Markoff Chains 18 

Morse, Anthony P , and Grubbs, Frank E The Estimation of Disper¬ 
sion from Differences 194 

Mosteller, Frederick, Hastings, Cecil, Jr , Tukey, John W, and 
Winsor, Charles P Low Moments for Small Samples A Compar¬ 
ative Study of Order Statistics 413 

Olmstead, Paul S , and Tukey, John W A Corner Test for Association 495 


in 



IV 


CONTENT^ oi? 


Raymond, 4., Birnbaum, Z. W., and Zuckerman, II. S. A Generalization 
of Tshebyshev’s Inequality to Two Dimensions 
SaNta l6, L. A. On the First Two Moments of the Measure of a Random 


Set 

Smith, John II. Estimation of Linear Functions of Cell Proportions 231 
Stein, Charles, and Wald, Abraham. Sequential Confidence Intervals 

for the Mean of a Normal Distribution with Known Variance 427 

Tukey, John W. Non-Parametric Estimation II. Statistically Equivalent 

Blocks and Tolerance Regions—The Continuous Case 529 

Tijke\, John W., Hastings, Cecil, Jr., Mosteller, Frederick, and 
Winsor, Charles P. Low Moments for Small Samples: A Compar¬ 
ative Study of Order Statistics 413 

Tukey. John W., and Olmstead, Paul S. A Comer Test for Association 495 
Wald, Abraham. Aii Essentially Complete Class of Admissible Decision 
Functions. 549 

Wald, Abraham, and Stein, Charles. Sequential Confidence Intervals 
for the Mean of a Normal Distribution with Known Variance 427 

Walsh, John E. Concerning the Effect of Intraclass Correlation on Cer¬ 
tain Significance Tests 88 

Welker, E. L. The Distribution of the Mean 111 

Whitney, D. R., and Mann, H. B. On a Test of Whether One of Two 

Random Variables is Stochastically Larger than the Other 50 

Winsor, Charles P., Hastings, Cecil, Jr , Mosteller, Frederick, and 
Tukey, John W. Low Moments for Small Samples: A Comparative 
Study of Order Statistics 413 

Wolfowitz, J. The Efficiency of Sequential Estimates and Wald’s Equa¬ 
tion for Sequential Processes 215 

Zuckerman, II. S., Birnbaum, Z. W., and Raymond, J. A Generalization 

of Tshebyshev’s Inequality to Two Dimensions 70 


Notes 

Albert, G. E. A Note op the Fundamental Identity of Sequential Analysis 593 
Belz, Maurice H. Note on the Liapounoff Inequality for Absolute 


Moments 604 

Blackwell, D., and Gmsmck, M. A. A Lower Bound for the Variance of 
Some Unbiased Sequential Estimates 277 

Bowker, Albert H. On the Norm of a Matrix . 285 

Brown, George W. On Small-Sample Estimation 582 

Castagnetto, Louis, and Cernuschi, Felix. Probability Schemes with 
Contagion in Space and Time 122 

Cernuschi, Felix, and Castagnetto, Louis. Probability Schemes with 
Contagion in Space and Time . 122 

Fr£chet, M. Definition of the Probable Definition 288 

Fr^chet, M. The General Relation Between the Mean and Mode for a 
Discontinuous Variate 290 




CONTENTS OP VOLUME XVIII V 

Girshick, M. A., and Blackwell, D. A Lower Bound for the Variance of 
Some Unbiased Sequential Estimates 277 

Greville, T. N. E. Remark on the Note “A Generalization of Waring’s 
Formula ,, 605 

Harris, T. E. Note on Differentiation Under the Expectation Sign in the 
Fundamental Identity of Sequential Analysis 294 

Kac, M , and Siegert, A. J. F. An Explicit Representation of a Stationary 
Gaussian Process 438 

Oberg, E. N. Approximate Formulas for the Radii of Circles which Include 

a Specified Fraction of a Normal Bivariate Distribution 442 

Paulson, Edward. A Note on the Efficiency of the Wald Sequential Test 447 
Pinney, Edmund. Fitting Curves with Zero or Infinite End Points 127 

Robbins, II. E. Acknowledgement of Priority 297 

Savage, L. J. A Uniqueness Theorem for Unbiased Sequential Binomial 
Estimation 295 

Scheffe, Henry A Useful Convergence Theorem for Probability Distri¬ 
butions 434 

Silgert, A. J. F , and Kac, M. An Explicit Representation of a Stationary 

Gaussian Process 438 

Truesdell, C. A Note on the Poisson-Charlier Functions 450 

Villars, D. S. A Significance Test and Estimation in the Case of Ex¬ 
ponential Regression 596 

Wald, Abraham. A Note on Regression Anaylsis 586 

Walsh, John E. An Extension to Two Populations of an Analogue of Stu¬ 
dents t-Test Using the Sample Range 280 

Walsh, John E. On the Power Efficiency of a t-Test Formed by Pairing 
Sample Values 601 

Welch, B. L. On the Studentization of Several Variances 118 

Wintner, Aurel. On the Shape of the Angular Case of Cauchy’s Distribu¬ 
tion Curves 000 

Wolfowitz, J. Consistency of Sequential Binomial Estimates 131 

Zygmund, A. A Remark on Characteristic Functions 272 

Miscellaneous 

Abstracts of Papers 298, 455, 607 

Annual Report of the Editor 159 

Annaul Report of the President of the Institute 195 

Annual Report of the Secretary-Treasurer 156 

Constitution and By-Laws of the Institute 160 

News and Notices. 145, 301, 463, 612 

Report on the Atlantic City Meeting of the Institute 306 

Report on the Boston Meeting of the Institute 149 

Report on the New York City Meeting of the Institute . 468 

Report on the April Meeting of the Institute in Atlantic City 469 

Report on the San Diego Meeting of the Institute 470 

Report on the New Haven Meeting of the Institute 616 



2 


M. S. BARTLETT 


p < q , and the sample with n _, r ^ n 8 

dependent variate is divided in the ^ , , ito a part 

with q degrees of freedom corresponding to the independent variate and the 
remaining part with n — q degrees of freedom. If an , bn denote the sums of 
squares and products corresponding to this division, then it is known that the 
joint distribution of a,-,- and bn , if the dependent vector variate is normal and 
actually, in the statistical sense, independent of the second vector variate, is 

I A i !(,-»-:> i B i 1 ) exp f- i £ ( a « + &„•)] da db 

(1) -_--J- 

2 inp {r[K? - 0]r[J(» - q - 01) 

t-0 

where | A | denotes the determinant of the matrix A = {a,-,}, and da the product 
of differentials da t j , and where for convenience the variance matrix of the 
dependent variate is taken to be the unit matrix. 

We make the transformation specified by 

A = WDW 

( 2 ) 

A + B = WW', 

where D is a diagonal matrix of the quantities r\ in descending order of magnitude, 
and W = {Wij} is a matrix (with transpose W') uniquely determined by (2) 
except for an ambiguity of sign for each column; this ambiguity can be eliminated 
by choosing positive elements in the first row. The Jacobian A of the trans¬ 
formation may be shown to be 

(3) A = 2 P I WW' | * phi n ri (A - r)). 

*—l J — i+1 

By direct substitution, we obtain from (1) the distribution 
Victij , M = p(v>n , r 2 <) = piwidpiu), 

where p(x) is a general notation 3 for a distribution function in one or more 
variates x> (including the differential elements); for p(w tJ ) and p(r]) we have 

(4) p( W</ ) = Ci | WW' | iin ~ p) exp I ]£ «?,] dw, 

(5) p(r\) = C 2 II (1 - r 2 i ) Un - , - p - 1> II (r; - r;)|> dr 2 , 

»-l ( 

3 The probability symbol is not of course to be confused with the number p of components 
in the dependent variate. It should also be noted that for convenience p(h) is used to 
denote the joint probability for a set of quantities , whereas p(x 0 or p(x 2 ) denotes the 
probability for the specified variate aq or jt-> considered separately. 





the constants C\ and C 2 being arranged to give unity on integration of p{wi y) 
or p(r<), i.e. we have 

(6) Cl = 2^"’ n (rti(p - i)]/r[*(n - i)]\, 

i«0 

(the Wij varying from — co to oo except that w u > 0), and 

(7) C t = T ip n {r[*(n - *)]/(r[§(p - f)]r[i(? - f)]r[*(» - g - *)])). 

t-0 

3. Formal determination of the general distribution. The method to be 
adopted of obtaining the general distribution from the particular case quoted in 
equation (5) above is the same in principle as the one adopted by Fisher [7] in 
his derivation of the general distribution of the multiple correlation coefficient. 
Since the argument is more involved in the present problem, it will be presented 
first in formal probability terms, before the details of the solution are examined. 

We consider a transformation of the components of each vector variate to the 
true canonical components. I^et the observed ordinary correlation coefficients 
of these mutually independent components for one vector variate with the 
corresponding components of the second vector variate be denoted by s». The 
true correlations are the true canonical correlations p;. Then we have for the 
general canonical correlation distribution denoted by 4 p{i\ | p,-), the expression 

P(ri | p.) = / p(ri , s< | p t ) 

= / piXi | Si , pi)p(si | p t ) 

= / PiXi | Si)p(si | pi)p(s 2 \ P 2 ) • • * p{s P | p P ), 

the substitution p(r t - | S;) for p(i\ | s t -, p») following from the sufficiency of the 
independent correlations s,- of the corresponding pairs of canonical components, 
as statistics for the p t . We now define the function g(s i, pi) by the relation 

p(«i I Pi) = P(«i I Pi = 0) g{si , pi), 
whence we have the general solution 

P(ri | pO = / piri | Si)p(si | pi = 0)g(si , pi)p(s 2 | P 2 = 0 )g(s 2 , P 2 ) • • • 

(8) = / p(r» , Si | pi = 0)p(si , pi)g(s 2 , P 2 ) • * • 

J «i 

= pin | pi == 0) / p(si | r» , Pi = 0)flr(«i , Pi)g(s 2 , P 2 ) • • • 
for p(r,- | p,) in terms of the special case p(r,-1 p, = 0). 


4 Quantities to the right of the vertical stroke in a probability bracket are given quanti¬ 
ties on which the probability distribution depends. 



4 


M. S. BARTLETT 


Now according as the independent vector variate is considered as (a) a normal 
variate *with which the dependent variate is correlated, (b) a fixed vector in 
sample space (this includes the non-central means case) Fisher [7] has shown that 
the distribution of the multiple correlation R of a single dependent variate with 
an independent variate comprising m components is p(R | p = 0 )g(R, p), where 

(a) g(R, P ) = F(i n, £ n; | m; p 2 R 2 ) (1 - p 2 )>", 

(9) 

(b) g(R, p) = nhn-,\m-Af?R 2 ) e-*', 


where we replace p 2 by a parameter d 2 in case (b), and the notation for hyper¬ 
geometric functions used is: 


F(a;0;x) = 1 + ^ + 


a(a + l)x 2 

W + 1 ) 2 ! 


+ 


aj; (8; x) = 1 + 


ai «2 x 


«i(a 2 + l)a 2 (a 2 + l)* 2 , 
0(0+1)21 


It follows that we may write g(si, pi) above in the form 

( a ) ui s i > pi) = F(& ft} i n i 2 > pl s i) (1 — plY > 

10 (b) g(si, pi) = F(in;btfl£)e- ie K 

by putting m = 1 in (9), (the signs of the are arbitrary, so that we are essen¬ 
tially concerned, as in the multiple correlation distribution, with the squares 
of the correlations). From these series expansions the integral in (8) consists of 
terms corresponding to the conditional moments, for any set of positive integers 
ti y to > * > Ip } 

M «1 ,<«,•••,<„)-A’K-DW* •••(4)'” I r,} 

- f (siYKslY' ... (4)‘'p(s < |r,-,P < = 0). 

In the particular case when only pi ^ 0, the moments p{t) = E{(slY | r<} from 
the single factor g(s \, pi) are all that arise, but in the general case it is important 
to notice that the quantities h\ , while statistically independent when unrestricted, 
are no longer independent for the conditional distribution p(.s< | r<, p< = 0). 
This completes the formal solution. It remains to evaluate , • • • , t p ). 


4. The conditional moment u(h , U , • • • , t p ). First of all we note from the 
choice of the components of the dependent vector variate, applying the analysis 
of section 2 to such components, that the multiple correlation Ri between the 
ith component and the q components of the independent variate is given by 

R\ = da/ (o*» + bu) = <Xiir\ + a^rl + • • • + oY ip r 2 p , 


— w*//V (w]i + w 2 # + “ • • + w\p)* 


where 



CANONICAL CORRELATION 


5 


To obtain the distribution of the from that of the , we note that the w i; - 
distribution (4) is normal (allowing for convenience Wu to vary from — «o to oo) 
except for the “‘linkage factor” 

2 -ij><»-j> ) | ww , |i<-P> g {r[i(p _ t)/r[j (n - f)jj. 

i«0 

Hence if we transform to the variables c« , 0,,* defined by 
Ca = w] i + vfa + • • • + w\ p , 


(U) 


an = cos On , 

a »2 = sin On cos 0 t2 , 

a a = sin On sin 0 l2 cos 0»a , 


= sin $n sin 0 i2 sin 0 l3 • • • sin 0 lfP _i, 

the sets c<» , 0, y which for normal ic i; - would all be independent with distributions: 

l)(cn) — x distribution with p degrees of freedom, 

(12) p(Oij) K sin p “ 7_I 0; ; * dO t j , 

(0 < 6 t j < t r for j = 1, 2, • p — 2; 0 < 0 tlP _i < 2 tt), 


in general retain their independence for given i, but the linkage factor results 
in an elevation of the x 2 distributions to n degrees of freedom, and a linkage 
factor for the 0,-, distributions of 


(13) 

where 


| A 


p fr fm P - omwi 

- Oirap]/’ 


A ss {a*i«yi + ana# + • • • + oc ip a JP \. 

We may now, having obtained the distribution of the a tJ , note their geometri¬ 
cal interpretation. Let us denote the p components of the dependent variate 
in n-dimensional sample space by the p vectors ?i, &,•••,{*. Let the p 
orthogonal canonical components corresponding to the sample canonical correla¬ 
tions Ti be denoted by the p unit vectors Xi, x 2 , • • •, x p . Let the corresponding 
components for the independent variate be n,-, . The “linkage factor” 

merely represents the allowance that must be made in the mutual relations of the 
5-vectors for the fact that while they must lie in the p-space of the x-vectors, 



6 


M. S. BARTLETT 


they really belong to the original n-space. We may identify the w t j with the 
coefficients in the equation 

(14) t, = w t iXi + w t2 x 2 + • • • + w tp x p , 
where 

= U>\l + + * * * + w\ p 

is a x with n, and not p, degrees of freedom. If we now suppose for convenience 
to be a unit vector, we have in place of (14) 

(15) = dtiXi + Of* 2 X 2 + * * * + ottpXp , 

with a projection, on the q- space of the y-vectors, of , say, where 
C* = + <** 2^2 + • • • + ot t pr p y p , 

and hence, as already noted in the algebraic derivation, 

R\ « (t • 0 2 /C = «\irl + a\A + • • • + ct xv r\ , 

where (£ • {) denotes a scalar product. The linkage factor (13) indicates that 
the vectors in (15) are not independent in the p-space of the x-vectors, the 
distribution of their mutual configuration being determined by n-space. 

This interpretation enables us to determine the moments of the distribution 
p(s, | r»). For if corresponding to (15) we write 

(16) 11, = ftiyi + 0,2^2 + • • • + &xqVq , 

then 

(17) St — CitiPtiTi ~f" (X *2^i2^2 “1“ * * * “1“ OlipfitpTp . 

If we are consideiing case (a), the relations of the n, to the y-vectors in g-space 
will be similar to the relations of the to the x-vectors in p-space. In case (b), 
however, the n t , which represent the true canonical components of a set of q 
fixed vectors, must remain strictly orthogonal to each other although their 
relation to the y-vectors can vary. This means that the relations of the n, 
to the y-vectors are determined by a random rotation of a rigid orthogonal set 
of q vectors in case (b). We may note that if in case (a) we allowed n to tend to 
infinity, the n, would also become rigidly orthogonal, so that the solution in case 
(b) may conveniently be obtained from case (a) by retaining the same distribu¬ 
tion of the a t , and for the /S, letting n —> 00 . 

Thus in either case the moments of the s t can be obtained from (17) in terms 
of the moments of <x l} and fi tJ , two independent sets of coefficients for which the 
distribution of each set is known. The above comments suffice theoretically to 
complete the required solution for (si)* 1 ^)** • • • ( s 2 p ) tp is a function of a tJ and 
0t } ; the a XJ and the corresponding linkage factor can be expressed in terms of 
sin d l} and cos 8 tJ , and similarly for the 0*, in terms of, say, sin and cos <j > tJ , 
and integration cairied out over the 6 t} and . This method is unfortunately 



CANONICAL CORRELATION 


7 


too cumbersome algebraically to be of any practical value except in the case of 
one non-zero root. This case is considered separately before the general case is 
discussed further. 


5. The case of only one non-zero root. Here we only require n(t) and a 
comparatively simple solution is possible, the linkages within the {,* and m sets 
being irrelevant. We have in fact, if ^ is the angle between ni and (i, (where 
(i was the projection of in the g-space), that ^ is a random angle in the g-space, 
since the a*; and sets are independent. Hence in this particular case we may 
conveniently write si = R\ cos fy, which is just the transformation used to obtain 
the distribution of the multiple correlation R \. Thus we may replace (10) by 
(9), where R\ = ahrl + anr\ + • • • + a\ p r p , and 


oRiy = z 

«! + «# + * • 


Wi! 1*2 !• • ■ 

• cos 2ui On sin 2<< ~“‘ ) On cos 2 " 2 On sin 2 ''-” 1 -” 2 ' On ■ 


where the expected value of the trigonometric term is evaluated as 


(18) 


r(«x + i)r(w 2 + *)•••! 
r(*)r(j) ... j 


r (Jp) 

mp + t )' 


We have now obtained the distribution, (p 2 = • • • = p,, = 0), 

pin I px ^ 0) = pin | px = 0) C( Wl , U*, • • • • •, 

where p(r»-1 p, = 0) is given by (5); and in case (a) 

'-)-(i-«)*•(«)'[w'l 

_ hmm n r r(ui ± 

rii'p + <)r(k + t) L J’ 


and in case (b) 


C(wi, Ut, •••) 


= e 


HtAY 


r(in + t)r(}p)r(}-q) tt fr(t» y + i)l 
r(}n)r(j P + i)r(j 9 + t) ,-i L r(i)«,-! J’ 


where «i + Ut + • • • + is denoted by t. ,u 2 . • ■ • denotes summation of 
all u’s from 0 to a>. The solution in either case contains a generalized hyper¬ 
geometric function. If we denote the general series 

r(«x + t)r(a 2 + t ) r(r,)r(r 2 ) A rr(ft + M y )a:?’"|\ 

r(ax)r(a 2 ) r(rx + t)r(r 2 + of-1 L m)«y! Jf ’ 

^ /r(« + <) r(rx)r(r 2 ) Afm + mM'II 

r(a) r(rx + <)r(r 2 + 0 M L m)Uf! J J 


£»1.» 


. { 


F(a i } J $1 t Pi y * * * j fip \ 7i, 7*2 j X\ , x *, , %p)i 

F(a j , 02 j * * * t ftp i r \, 7*2 J y y * * * > %p) 


by 



8 


M. S. BARTLETT 


(19) 


respectively (see [8, p. 300, example 22]), then we have in case (a) 
P(r> | Pi 0) = p(r, | pi = 0)(1 — p\) in 
XF(in, in; i, i; ip, iq; pir?, p\r\, • • • , p?r 2 „), 

and in case (b) 

p(r< \ fa * 0) = p(r; | pi = 0)e - ^? 


( 20 ) 


XF(i n; i, i, ■ • • , *; ip, iq; ifol , i fi\r\ , ■ • • , ^r 2 ,). 


An alternative operational form is obtained by noting that the sum of terms for 
given t = ui -f~ U 2 + • * * + u p is generated by means of the coefficient of z* in 

11(1 - *r 2 «r*, 

7-1 

where for definiteness we consider case (a). Hence if we write 

r(n)r (r 2 ) 


T?t~ ~ = V* ^ r («2 4* t) 

" r(a,)r(a,) r(n + <)r(r s + 0 


a:', 


we have 


^(in, in; i, i, • • • , i; ip, iq; pM , p\rl , • • • , pir 2 „) 

(21) p 

= OF (in, in; ip, iq; s~ l ) II (1 ~ pUUrK 

7-1 

where 0 denotes the operation of taking the term independent of z (this might 
possibly be done by multiplication by z~~ l and evaluation of a suitable contour 
integral, but in the use of this formula here the operation 0 has been carried out 
directly). 

It is of some interest to examine a simple case, and, incidentally, to check that 

f P(ri | Pi) = 1 - 

Jfi 

If we take p = 2, q = 3, we obtain for p(r\, r\ \ pi = p 2 = 0) the form 

l(n - 2)(n - 3)(n - 4)(1 - ^"^(l - r\) in ~\r\ - r\) dr\ dr \. 


Considering the distribution (19) with p = 2, q = 3, and taking the most ele¬ 
mentary case n = 6, we obtain on integration of r\ from 0 to r \, 

pin | Pi) = 6(r?) 2 dn(l - pi)' £«,.«, [ r ^r(^ ~ ] 

r(j) r(> + w,)rtt + udtiAy 
r(i+f) m)r(i)Mi !(« 4 + 2)i<! ’ 

where t = Ui + . Now from the identity (1 — x)""*(l — a:) 4 = 1 — x, the 



CANONICAL CORRELATION 


9 


coefficient of x l+J , (< > 0), is zero in the expansion of the left-hand side. This 
provides the identity, for all t > 0, 

v* + «i)r(i + t - M,)J. 1 _ r(i + t + i) _ r(* + 1 + 2) 
z "“ 1 r(i)«iir(i)(«» + 2)! r(i)(t + i)i r(«(f + 2 )i 

= W ± 3)r(| + 1) 

mm+ 3) ’ 


r($ + w0r(i + t - ui) _ t + 3 r(i + o 


r(})ui!ra)(u2 + 2)i 


3 r(|)r(< + 3)' 


Hence 


P(r, | pi) = 6(ri) 2 dri(l - pi) 3 ^ (pM)' 

(23) = (rl? dr\( 1 - p?) s r(3 + 3 - 

= (1 ~ A)'dAd/*A{tf?(X - pIt\T 3 \, 

which obviously gives unity on integration of r\ from 0 to 1. In purely algebraic 
form 

(24) p(r, | Pl ) = 3(1 - pl)Vi) 2 dr\/{ 1 - P \r\)\ 

Alternatively, making use of formulae (21), we have for the same case p = 2, 
q = 3, n = 6, the distribution 

(25) 6(n - rl) dr\ dr\{ 1 - p\Y QF{ 3, 3; z~ l ) (1 - p\r\zT * (1 - p\r\zT k . 
Integrating with respect to r\ from 0 to r\ , we obtain 


.2 ,-j2r{ 4[(1 - pinz) 1 - 1]' 


(26) 6 dr\( 1 - pi) OF(3, 3; \; z~') (1 - P lr\zY 


Discarding the term for which the irrational expression (1 — plrlz) i cancels, 
and hence leaves no terms independent of z, we obtain the distribution p{r 1 | pi) 
given in (23) or (24) by selection of the appropriate terms. We may further 
integrate directly the expression above with respect to r \, and after discarding 
again irrelevant terms we obtain 

(27) 6(1 - p'i) 3 eF(3, 3; 1; z~') {- (1 - P \z)' j, 

which is readily ascertained to be unity. 

6. More than one non-zero root. In the general case the factor multiplying 
p(r< | Pi = 0) is rather remarkable in being symmetrical in both the set r* and 



10 


M. S. BARTLETT 


the set pi . As n increases, the convergence of n to pi, r 2 to p*, etc. when the 
Pi are also arranged in descending order of magnitude must result from the 
restriction n > r 2 > • • • > r p . The limiting distribution has been discussed 
by Hsu [9]. 

In view of the algebraic difficulty of obtaining , U , • • • , t P ) by direct 
integration, an unsymmetric method of obtaining the moments was developed. 
This is fairly tractable in the case of two non-zero roots. The second set 102 / 
of the original variables is transformed by an orthogonal transformation such 
that the first new variable of the second set is determined by the correlation 
between W\j and w 2 j . We may write, for example, 


w'n = (w n wn + w a w 22 + ■ • •) /{w\i + tu? 2 + • • • + WipA 


, _{= 


-Wn(Wn H-+ Vh p )w n . 

-h WnWn ■+* 


Wu 


(28) 1022 c ( w f L _I- w 2 lp )(wh H-f- w \ p )y 


2 

W U 


1028 = 


-Wn(w\z + • • • + wlp)w22 


w\ 2 


+ WuWm + 


((wu + • • • + w\ p )(wn + * * * + ^i P )V 


Wu 


which conversely we can at once express as a relation of the w 2i , in terms of the 
, (since the reciprocal of an orthogonal matrix is simply its transpose). If 
we write 


a 21 — w 2 \/[(w 2 \f + (1022) 2 + • • • + (l02p)Y> 

«22 = 1022/[(l02l) 2 + (l022) 2 + * * * + (W 2 p ) 2 ]*, 


and write further 

Oi = COS On , (h = COS 0 i 2 , • * • , 61 = COS 021 , 

62 = cos 022 , * • • , where <*21 = cos 02i, <*22 = sin 02i cos 6 22 , • • • 
we have in particular 

Ctn = O161 - &2 -\/(l - a,) V(1 - bl), 

( 30 ) an = 0262 -\/(l - a?) + aiaibi -\/(l - bl) 

~ &3 V(1 - al) V(1 - b\) V(1 - 62), 

where the distribution of the a’s and b’ s is proportional to 

{(1 - a?)*^ da,} {(1 - ^)‘ (p - 4) daj} •••{(! — b\) i{n ~ >) d&,} {(1 -dfc} • • • . 



CANONICAL CORRELATION 


11 


For the reasons discussed in section 4, it will be noticed that only the distribu¬ 
tion of bi in the a, b set is affected by the linkage factor. By such methods the 
expressions 

M (l, 1) - E{s\s\ | r<}, m ( 2, 1) = tf{s}4 | r,} 

were fairly readily obtained. If we introduce the notation 

Sh 53 Z) (r*)*, Ski s 2 (fSftf?) 1 , etc., 

i-l 

and also symbols for the products of the a and /S moments, viz. 

3 ®{«ii«ia} ^{/5ii/3i2}, ^ 2 ) ~ ^{«ii«22) ^{^ 11 ^ 22 }, 

etc., we may list the moments p(£i , k , • • • , 2 P ) as in Appendix I, which gives all 
moments up to the fourth order in terms of the a and fi moments (the numerical 
coefficients arise from the numbers of ways of forming the two-way partitions). 
“Half-factors” corresponding to the a moments are listed in Appendix II against 
their appropriate symbol, the corresponding factors coming from the P moments 
being obtained in case (a) by writing q for p and in case (b) by writing also 8 
n —> 00 . Thus in case (a) 

[ n + 2 If n + 2 1 
L np(p + 2)J [nq{q + 2)J 

, /r np + n - 2 1 r nq + n - 2 1 

\Up(p + 2)(p - 1)J L nq(q + 2){q - 1)J 

■ 2 f - (n - p) 1 f - (n- g) 1\ 

L np(p + 2)(p - 1)J lnq(q + 2 )(q - 1)J/ 11 ’ 

and in case (b) 

M(1 ’ 1} = [^(F+'2)] L(? + 2)] & 

(33) 

, ( np + n - 2){q + 1) + 2(n - p) \ 

Vp(P + 2)(p - 1 )q(q + 2 )(q - 1)/ 11 ' 

By means of the transformation (28) it is possible to develop the moments 
p(ti , fe) in the case of two non-zero roots, though in obtaining the results quoted 
in Appendix II, where the formula for /*(3, 1) and p(2, 2) are included, it was 
found convenient to supplement this method with the devices mentioned in the 

6 It should be remembered that we have assumed p < q. If p > g, we interchange 
the dependent and independent vector variates, and hence must interchange p and q in 
these moment formulae, p(<q) now corresponding to the independent variate. 


m(1, 1) = 

(32) 




12 


M. S. BARTLETT 


next section. In the case of more than two non-zero roots, it is theoretically 
possible to carry out a further transformation on the w 3i variates, but with the 
“partial” variates tci,*. 2 s Wi 3 — buWzs , where 

bn = (wnW2i 4- W 12 W 22 + • • -)/(wn + W 12 + • • •), 

as coefficients. This enables us to express w 3 j in terms of new variables, of 
which the first is related to the partial correlation of w 3 j with w\j for given w 2 j , 
i.e. to the second correlation factor which depends on the “linkage”; and so on. 
This method is, however, again too cumbersome to be of much use, and a more 
rapid method of evaluating n(t\ , h , • • • , t p ) in general is desirable. This problem 
has not been entirely solved to the author’s satisfaction in this paper, although 
in the concluding section are mentioned devices which have been found useful, 
and which enabled the terms for the remaining third-order moment p(l, 1, 1) 
to be completed and added to Appendix II. 

7. Relations among the a-moments. Equation (15) defining the a’s, the 
& being random vectors in the p-space of the x-vectors except for their mutual 
configuration being determined by the properties of n-space, may be used to 
provide relations among the a-moments. Thus in addition to the identities 

(34) a 2 tl + 4- • • • 4- a~ ip =1, (i = 1, 2, • • • , p), 

the correlation of any with a fixed vector in the p-space, e.g. with Xi or with 

(jCi 4- x 2 )/\/ 2, is a random correlation in p-spacc, whereas the correlation of any 
with any other is a random correlation in ?i-space. The use of these facts 
is best illustrated by an example and equations sufficient to determine the 
six a-moments required for p(l, 1, 1) will be derived. 

For convenience, denote the required mean values of 

222222222 2 2 
ana 2 ia 3 i , aua2ia 3 2 , ana 22 a33 , aiitti2<*2i«22<X3i , anoii2Qt2i<X220t3s , ana^a^atz^otsia^ 

by A, B, Cy Dy E } F respectively. Multiply the second-order quantities ahah , 
ana 2 2 , anai 2 a 2 ia 22 by expression (34) for i = 3; since this expression is identically 
unity, the consequent mean values are unaltered. This gives the three relations 

A + (p — 1 )B = (n + 2)/{np(p + 2)), 

A + 3(p — 1 )B + (p- Dip - 2) (7 = 1 Ip, 

K ’ A + (p-\)B + 2(p- l)(p - 2)D 

+ (p~ Dip ~ 2 )ip - 3 )E = l/inp). 
The moment A is the mean of the triple product of the squared scalar products 
of D , D and & with Xi. The same value must be realized with any other fixed 
vector in the p-space, e.g. with either (xi + x 2 )/ V2 or with (*i + *j + • • • + x p ) 
/\/p. This gives two relations 

A -5-40=0 

ip + DA -W- 12 D - ip - 2) (C + 6 E + 8 F) = 0. 


( 36 ) 



CANONICAL CORRELATION 


13 


A final linearly independent relation is obtained from the mean triple product of 
(?i • £ 2 ), (Ci * &), (C 2 • 5s), which depends solely on the internal configuration of 
£ 1 , U and , and is easily shown (e.g. choose to coincide with one of the original 
axes of the n-space) to be 1/n 2 . This gives 

(37) pA + 3 pip - 1 )D + p(p -l)(p- 2 )F = 1/n*. 

The equations contained in (35), (36) and (37) determine A, B, C, D, E and F. 

Similar equations could evidently be constructed for the higher-order moments, 
e.g. for the terms required for m( 2, 1, 1) or /x(l, 1, 1,1), but the numbers of such 
terms increase rapidly. From Appendix I it will be seen that there are 24 
distinct terms in ju(2,1,1) and 16 in /i(l, 1,1,1). 


Appendix I. 


- S.0)+ 2S,i{( 2 2 ) + 2 C l)} 

11 “( 2 )+{(' 2 ) + °(■> 2 ) +8 (> 0} 

+ “»{ 3 C 2 2) + 12 G 1 2 )} 

.(3,1) + 2 ) + (j 2 ) + 12 (i I)} 

+ 2S »H 4 2 ) + 2 °(i 1)} 

+ 2S».{l5( 4 2 2 )+ 45( 2 2 2 )+12o( 3 J 2 ) + 30([ J 

+ 24S„„{l5 ( 2 2 2 2 ) + 90(j ; 2 2 )} 

B (2,2) - &(*) + «» {l2 ( 4 2 ) + 16 (| })} 

+2&,{( 4 4 )+i 8 ( 2 2 )+u(; 

+ 24l “{ 6 ( 4 2 2) + 36 ( 2 2 2 ) +72 G i O + Kl 1 

+ 24S«ii {® (? 2 2 2) 22 G 1 2 2 ) + 24 G I ! 




14 


M. S. BARTLETT 


m( 1, l,l)-s« 2 +4 3 2 • + 12 1 1 


+ 6/Sm 


2 ■ • 


+61 1 • +8 • 1 1 


m(2, 1, 1) — S4 [ 2 J + Sji i 2 [ 2 •1 + 6(2 


+ 16 1 1+41 1 +2& • 2+62 



+ 16 1 1 + 12 1 1 +2Sm • 2 • +12(2 




1 1 2 ' 


+ 16 1 1 • + 2 ■ 1 1 +24 1 1 


2 1 1 


+ 48 ^1 1 • J + 48 ^1 1 • J + 24 ^ 1 ij + 32 y 1 1 

f /2 2 • A /2 2 • A /I 1 1 1\ 


2 1 1 


+ 24/Sjjh ■{ 3 


1 1 +24 1 1 


/2 1 1 A /I 1 • 2' 

+ 24 1 - 1 1 -J + 48I- 1 1 • 


2] f [2 0 

m(1 , 1,1,1) = s t ^l + sA 2 + 24 


2 • 1 1 


f (2 •' 


+ 2 s 2 ^3 : +24 ; ; + 8 


1 1 1 ~ 1 1 

11 11 

/ \ ) J 


+ 2/S 2 n < 6 


1 • 1 

1 1 • 

1 1 • 

1 • 1 



CANONICAL CORRELATION 


15 



f 1 1 •] 


f 2 • 


+ 96 

• i i 
i • i 

+ 12 

• 1 1 

• 1 1 



12 • 'J 


2 . . 



+ 24<Sim <1 


+ 12 


1 1 
1 1 


1 1 


+ 32 


1 1 • 

• 1 1 

1 • 1 


+ 12 


i ij 


l- • 

• 2J 



[2 • • -1 



\l 1 

.1 


1 1 

• 1 1 • 

+ 48 

• 1 

• 

1 1 


1- • • 2j 



.1 • 

• lj 

J 


Appendix II. 

/2\ n + 2 /2 A np + n — 2 /l l\ — (n — p) 

\2/ np(p + 2)’ \- 2/np(p + 2)(p - 1) ’ \1 1/ np(p + 2)(p — 1) ’ 


A\_ 3(n + 4) /4 A 3(np + 3n - 4) 

\2/ np(n + 2)(p + 4) ’ \* 2/np(p + 2)(p + 4)(p — 1) ’ 

/2 2\ np + n + 2p — 4 /2 2 A np + 3n — 4 

\2 •/np(p + 2)(p + 4)(p— 1)’ V • 2/np(p + 2)(p + 4)(p-l)’ 

/3 ~3(n - p) /I 1 2\ -(n-p) 

\1 1/ np(p + 2)(p + 4)(p - 1) ’ \1 1 •/ np(p + 2)(p + 4)(p - 1) ’ 

/6\ 15(w + 6) /6 A 15(np + 5n - 6) 

\2/ np(p + 2)(p + 4)(p + 6) ’ V 2/ np(p + 2)(p + 4)(p + 6)(p — 1) ’ 

/4 2\ 3(np + n + 4p — 6) 

\2 V np(p + 2)(p + 4)(p + 6)(p - 1) ’ 

/4 2\_ 3(np + 3n + 2p — 6) 

V 2/ np(p + 2)(p + 4)(p + 6)(p - 1) • 

/4 2 A_ 3(np + 5n — 6) _ 

V • V np(p + 2)(p + 4)(p + 6)(p - 1) ’ 

(2 2 2\ np + 3n + 2p - 6 

\2 • •/ np(p + 2)(p + 4)(p + 6)(p - 1) ’ 

/2 2 2 A_ np + 5n — 6 _ 

• 2/ np(p + 2)(p + 4)(p + 6)(p - 1) ’ 

/5 l\_ — 15(n - p) _ 

\1 1/ np(p + 2)p + 4)(p + 6)(p - 1) ’ 



16 


M. S. BARTLETT 


/3 3\ _ —9(n — p) _ 

\1 1/ «p(p + 2)(p + 4)(p + 6)(p — 1) ’ 

/3 1 2\_ —3(n — p) _ 

\1 1 •/ np(p + 2)(p + 4)(p + 6)(p — 1) ’ 

/l 1 4\ _—3(w - p)_ 

\1 1 •/ »p(p + 2)(p + 4)(p + 6)(p - 1) ’ 

A 1 2 2\ -(« - p) 

\1 1 • •/ np(p + 2)(p + 4)(p + 6)(p - 1) ’ 

/ 4\_ 9(w + 4 )(n + 6)_ 

\4/ n(n + 2)p(p + 2)(p + 4)(p + 6) ’ 

/4 •y 9(n , (p + 3)(p + 5) +2n(p + l)(p + 3) - 8(2p + 3)} 

\- Vj n(n + 2)p(p + 2)(p + 4)(p + 6)(p - l)(p + 1) 

/2 2\ 3{n 2 (p -f- 3) + 6n(p + 1) + 8(p — 3)} 

\4 •/ n(n + 2)p(p + 2)(p + 4)(p + 6)(p - 1) ’ 

/2 2\ n 2 (p 2 + 4p + 15) + 6 n(p + 1 )(p - 3) + 4(5p 2 + 2p - 6) 

\2 2/ n(n + 2)p(p + 2)(p + 4)(p + 6)(p - l)(p + 1) 

/4 • -\ 3jw 2 (p + 3)(p + 5) + 2 r(p + l)(p + 3 ) — 8(2p + 3)} 

\- 2 2/ n(n + 2 )p(p + 2)(p + 4)(p + 6)(p — l)(p + 1) 

(2 2 A n. 2 (p + 3) 2 + 2w(p + l)(2p + 3) + 4(p 2 - 4p - 6) 

\2 • 2/ n(n + 2)p(p + 2)(p + 4)(p + 6)(p - l)(p + 1) 

(2 2 • A n 2 (p + 3)( p + 5) + 2n(p + l)(p + 3) - 8(2p + 3) 

\- -2 2/ n(n + 2)p(p + 2)(p + 4)(p + 6)(p - l)(p + 1) ’ 

/3 l\ —9 in — p)(n + 4) 

\3 l/n(n+ 2)p(p + 2)(p + 4)(p + 6)(p - 1) ’ 

/3 l\ —9(n — p)(np + 3» + 2p ) 

\1 3/ n(n + 2)p(p + 2)(p + 4)(p + 6)(p - l)(p + 1) ’ 

/1 1 2\_— ( u — p)(wp — 3n + 8p + 12) _ 

\1 1 2/ «(« + 2)p(pT 2Xp + 4)(p + 6)(p - l)(p + 1) ’ 

/3 1 A_ —3(w — p)(np + 3n + 2p) 

\1 1 2/ n(n + 2)p(p - 2)(p + 4)(p + 6)(p - l)(p + f)’ 

/1 12 A__ — (n — p)( n p + 3n + 2p) _ 

\1 1 • 2/ n(n + 2)p(p~+ 2)(p”+ 4)(p + 6)(p - l)(p + 1) ’ 

/l 1 1 l\_ 3(n - p)(n - p - 2) _ 

\1 1 1 1/n(n + 2)p(p + 2)(p + 4)(p + 6)(p — l)(p + 1) ’ 



CANONICAL CORRELATION 


17 


' 2 \ (w + 2 )(n + 4) 

12/ « 2 P(P + 2 )(P + 4 ) 


■ fD. 


(n + 2)(np + 3n — 4) 


■ 2 p(p + 2)(p + 4)(p - 1)’ 


C'D* 


(p 2 + 3p - 2) - 6re(p + 2) + 16 
2 P(P + 2)(p + 4)(p - l)(p - 2) ’ 

— (» — p)(n + 2) 


■ 2 p(p + 2)(p + 4)(p - 1)’ 


(i : i)» 


( :: D. 


— (n — p)(np + 2n — 4) 


n 2 p(p + 2)(p + 4)(p - l)(p - 2), 
(n - p)(2w - p) 


2 p(P + 2)(p + 4)(p - l)(p - 2)' 


REFERENCES 

[J] II. Hotelling, “Relations between two sets of variates”, Biometrika , Vol. 28 (1936), 
pp. 321-377. 

|2] R. A. Fisher, “The sampling distribution of some statistics obtained from non-linear 
equations”, Ann. Eugen Vol. 9 (1939), pp. 238-249. 

|3] P. Tj. IIsu, “On the distribution of roots of certain detenninantal equations”, Ann. 
Eugen., Vol. 9 (1939), pp. 250-258. 

(1] S .N. Roy, “P-statistics or some generalizations in analysis of variance appropriate to 
multivariate problems”, Sankhyd , Vol. 4 (1939), pp. 381-396. 

[5] S. N. Roy, “Analysis of variance for multivariate normal populations. The sampling 
distribution of the requisite p-statistics on the null and non-null hypothesis”, 
Sankhyd , Vol. 6 (1942), pp. 35-50. 

[61 M. S. Bartlett, “The vector representation of a sample”, Proc. Camb. Phil. Soc. f 
Vol. 30 (1934), pp. 327-360. 

[7] R. A. Fisher, “The general sampling distribution of the multiple correlation coefficient”, 

Proc. Roy. Soc., Vol. A 121 (1928), pp. 654-673. 

[8] Fj. T. Whittaker and G. N. Watson, Modern Analysis, Cambridge Univ. Press, 4th 

ed., 1935. 

[9] P. L. Hsu, “On the limiting distribution of the canonical correlations”, Biometrika , 

Vol. 32 (1941), pp. 38-45. 



ON THE THEORY OF MARKOFF CHAINS 


By Elliott W. Montroll 
University of Pittsburgh 

1. Summary. Although there exists voluminous literature on the theory of 
probability of independent events, and powerful techniques have been developed 
for the analysis of most of the interesting problems in this field, the theory of 
probability of dependent events has been father neglected. The first detailed 
investigations in this subject were published by A. Markoff [1]. S. Bernstein [2] 
has extended the fundamental limit theorems to chains of dependent events. 
The most extensive exposition of this field has been made by M. Frdchet [3]. 

In the present paper we shall develop methods of averaging functions over 
chains of dependent variables and find the probability distribution of these 
functions. It will be shown that for certain types of chains these averages and 
distribution functions can be expressed in terms of the characteristic values and 
vectors of a certain operator equation. Many of the methods discussed here 
have been applied to problems in statistical mechanics [4,5, 6,7,8]. The most 
important application has been made by L. Onsager [8] who proved rigorously 
(on the basis of a simplified model) that Boltzmann’s energy distribution in a 
solid with cooperative elements leads to a phase transition. The first explicit 
application of linear operator theory (through matrices and integral equations) to 
probability chains has apparently been made by Hostinsky [9]. 

2. Introductory Remarks. Suppose there exists a chain of events each of 
which might lead to one of v possible results, and which are correlated in such a 
manner that the probability of n successive events leading to a chain of results 


oti , a 2 , • • • , a n 

is proportional to 

Pn(oti , a 2 , • ■ * , Ofn). 

The probability of a given function F(a i, a 2 , • • • , a n ) having a value correspond¬ 
ing to the sequence of a’s would be proportional to 

F(a i, a 2 , • • • , On) P n (d i ,•••,<*„) 

and its average value over all configurations of the chain would be 

(1) P = Fi/Fo = 2 F(ai , a 2 , • • • , a«)P <l (ai, at , • • • , a n )f 2 l *•••*«») 


Fm 2 f^(«i > a* > , • • • , tt»)P P »(«1 ,•••,«*) 


where 

(la) 


18 



MARKOFF CHAINS 


19 


and the summation extends over all values of 

{<*,•} = (m y On). 

The probability of a result ai of the first event leading to a result a n of the 
nth event is 

(2) , a n ) = (1/Fo) 2 -Pn(«i, a 2 , • • • , a n ). 

X 7 «*,“'.an-! 

In order to find the probability of a given function F(a i, • • • , a n ) having a 
value between £ and £ + ft it is useful to know the moments and Thiele semi¬ 
invariants of F(a i, • • • , a„). Both of these functions of F can be calculated 
from 

(2) Zn(x) = X) Pn(<*l ,•••,««) exp {£^(<*1 ,«£,•••, a»)|. 

i a i } 

Obviously 

(4) Fm = lim d m Z n (x)/dx m . 

It is known [10] that the mth Thiele semi-invariant is given by 

(5) A m = lim d m log Z n (x)/dx m . 

In the notation of Cram6r Z n (ico)/Z n ( 0) = /(go), the characteristic function of F. 

If G(z) is defined so that G(£ + A) — (/(£) is the probability that the function 
F(a i ,•••,«„) has a value between £ < F(a i ,•••,«„) < £ + ft, then it is well 
known that [5] if 6r(z) is continuous at x = £ and a; = £ + ft 

(6) G(| + A) — G({) = — lim / ---- exp [log/(«)] d« 

juITv t —»oo J— 3* GO 

where 

(6a) log/(«) = £ = £ A.(to) m /ml + o(« l ). 

m—1 m 1 m-1 

When the derivative of G(£) with respect to £ exists, the probability of 


F(ai , ,ot n ) 

having a value between { and ( + d( is 

(6b) *({) dt = (dG/df) d{ - ^ lim f T exp (f) A m (tw ) m /to ll e - ** 1 dw. 

^7T 7 *-*ao J— T ^m-»l J 

From (4) 


00 



A,(to)7m! = -log Z n (0) + lim e - ** 9 ' 9 * 

*-*0 


(7) 


log Z„(z). 



20 


ELLIOTT W. MONTROLL 


Since, for a constant c independent of x , 

e‘ d/a 7(x) = f(x + c) 

we have 


( 8 ) 


2 A m (i«)7w»I = log {Zniiu)/Zn( 0)}, 

m—1 


and from (6) 

(9) G(6 + A) 


G(!) = i Bm / 

Z7T2 r-*oo J— 


r e ~**(l - e^ ah )Z n (io>) do> 


aZ n ( 0) 


Equations (3), (4), (5) and (9) indicate that much information concerning a 
chain of correlated events can be obtained from a knowledge of Z n (x). We shall 
now introduce procedures for the determination of Z n (x) for several general forms 
of P(on , • • • , a n ). 

When a is a continuous variable, the results of this section and those to follow 
are easily generalized by replacing the summations operations over all values 
of the a’s by integrals, and by replacing the matrix equations of the next section 
by integral equations. 


3. Simple Chains, P n (a u •••,«„) = JI P( a >, “;+i)- 

a. General theory. By a simple chain we shall mean a sequence of events, 
each of which leads to one of v possible results and which occur in such a manner 
that if the result of the kth event is a k , the probability of the ( k + l)st one 
yielding a result a k +i to proportional to p(a k , a*+i). This implies that the 
probability of the occurrence of the sequence of results 

oti , <* 2 , * * • , 

is 

n—1 / n—1 

do) n p( oti, a.+i) /En p(a,, a< + i), 

t—1 / | a,} i—1 

and the probability of a first result a \, leading to an nth result a n is 

(11) P n(«i , a B ) = 2 n ?(«;» “i+i) / 2 IIp(a», «j+i)- 

The summations are to be extended over all v possible values of each a* indicated 
on the summation indices. Chains of this type are sometimes called simple 
Markoff chains after the first author who studied them systematically. 

From (1), the average value of a function F(a i, • • • , a n ) is 

n—1 

2 ••• 2 ^(“i ,■••,«») n p(«j» «/+>) 

F 1 /F 0 = -=2-j-f- t* -. 

2 ••• 2 n ?(«,-.«/+.) 

«1 «n J-l 


( 12 ) 



MAKKOFF CHAINS 


21 


Many chain functions F(a 1 , • • • , a n ) of interest are either additive or multiplica¬ 
tive and of one of the forms 

(13a) a) F\(ai , • • • , a„) = h(oti , a 2 ) + h(a 2 ,<*)+••• + h(a n - i, a n ) 

(13b) b) F 2 {ctl , • • • , otn) = 0(«1 , « 2 ) ^(<*2 , <* 3 ) * * * 0(a»_i , a n ). 

In case (b) it is convenient to define a new function h(ai , ay) by 

(14) g(a t , ay) = exp[xh(oc t , a,)] 

and in both cases to consider a function of the form 

71—1 

as) z.(x) = E II p(a,, a j+ 1 ) exp [xhittj , a m )], 

l«.l i-i 

for then the values of Fi and F 2 averaged over the entire chain arc given by 
(16a) <Fi> av . = lim d log Z n (x)/dx 

x-+0 

and 


(16b) 


<F 2 > av . = 


When n is large, the direct evaluation of (15) may become quite difficult 
because of the large number of variables involved. As an alternative we shall 
now introduce a procedure that is based on the observation that Z n (x ) is the 
sum of the elements of the nth pow er of the matrix 


(17) 


1) Px( 1,2) ••• p x ( 1, *0 

0.(2, 1) p,(2, 2) ••• p*(2, v) 


[px(y y 1) p r (v t 2) 


Px(v, V) ) 


where the elements p x (a, fi) are defined as 


(18) 0,(a, fi) = p(a, 0) exp [xh(ot, fi)]. 

a and fi range over the same set of values as one of the “result” parameters 
ay ; and each of the v possible results is represented by a unique integer of the set 
1,2, • ♦ • , v. Thus Z n (x) = sum of elements of P x ~ l . To employ this observa¬ 
tion to advantage, let us consider the characteristic values and vectors of the 
matrix P x . It is well known that if the characteristic* values are simple the 
characteristic vectors form a biorthogonal set; that is, if 

(19a) = {*,,.( 1), *.,(2), • • • , *>.•»}, d = 1, 2, • ■ • , v). 


and 

(19b) 


*».* = 


^t.x(l) 

2 ) 




22 


ELLIOTT W. MONTROLL 


satisfy the operator equations 

(20a) *£*,* * P* = ^t,*^*,* 

(20b) P x • * tl * = X, 

where X, r is the zth characteristic value of (17), then 

9 

= H <Pux(a)^,Ja) = 0 when i ^ j. 

a—1 

We shall for convenience always assume that the <p’s and \p’$ are normalized: 


so that in general: 

( 21 ) $,*•*,,* = 8 lJ = 


3 \,* • = 1 

0 when i ^ j 


1 when i = j. 

It is well known from matrix theory that one can expand a matrix element as 


p x (a, 0) = tx^^MUa) 

t-i 


( 22 ) 

and that 

(23) X»,x = $»,* • Px • . 

By substituting (22) into the expression for Z„(x) in terms of P* -1 , one can 
show that 


(24) 


Znix) = £ {KJ n - l \£<P>.M}{£*M } 




Therefore Z n (x ) can be determined from a knowledge of the characteristic vectors 
and values of the matrix P x . 

If there exists a largest characteristic root X, t * such that 

(25) Xl,* > | X*,x | if i L, 

one can obtain some interesting results. Before deriving these, we shall give a 
sufficient condition (which is satisfied in many chains) for the existance of this 
inequality. Frobenius [11] has shown that if all the elements of a finite matrix 
are > 0, then the characteristic value of largest absolute value of the matrix is 
real, positive, and simple (nondegenerate). Thus, as long as v is finite and 
p*(a, 0) > 0 for all a and 0, (25) is valid. 

We shall now prove that 


(25a) 



MARKOFF CHAINS 


23 


that is, 

(25b) Z n (x) ~ X2- 1 ($ 1 , I • 1)(1 • * L , X ). 


First let us consider the case in which P x is a symmetrical matrix. 
<Pj.x(<x) = f ).*(<*)} all the characteristic values are real, and 

Z n (x) = Ar*WD 2 + E 

From Cauchy’s inequality and (21) 


Therefore, 




a—1 


< [Z JjC lj = 


Zx?,*Wd 2 < * Ex ,"; 1 < v(v -1) ix“r x | 

if* L \y£ L 


Then 


where X,,. is the characteristic value of P x second largest in absolute value. 
This inequality yields 


(25c) 


Zn(T) 


i 


y(y ~ 1) I /X.,.\ n 1 1 

s feinlw I 


and (25a) (since \,, x /\ r ., x < 1) follows, 
easily derive the analogous expression 


Znjx ) 

X 2 ^( 4 >l,.- 1 )( 1 -^ i ., i ) 


Wlien P x is not symmetrical, one can 

^ M* - DlWXu h 


where 


A = [max (| (3>»,x • 1) |}][max || (1 • *,,*)|)] 


For brevity, when x = 0, we write A 1>JC as A*, as 'k, and 3>,,x as <f>,. By 
summing (10) over all a’s except a x , a k and a n we obtain the probability of an 
intermediate event leading to a result a k if the results of the first and last events 
are known to have been ay and a n . With the aid of (21) and (22) it is easy to 
show that this probability is exactly: 


(26) 


V 

Z X"^X < _1 ^i(ai)<fii(a k )<p,(a k )<pj(a n ) 


Ex ?- 1 Z hMv.M 

*-l «l.«l» 


When n is very large, and when we have simultaneously n >> k >> 1, we can 
rewrite this equation to include \ L , and neglect all terms containing other t’s aftd 
j' s. This leads to the results 

a) If the number of events, n, in a simple chain is very large, the probability 
P n (<*k) of a ifcth event far removed from the first and the last, yielding a 
result a k when ay , and a n are unspecified is 

Pn(ot k ) ~ faiak) (PL^Otk) / ($i * 1)(1 • *l). 


(27) 



24 


ELLIOTT W. MONTROLL 


b) When k = n, the probability of the result ai • of the first event leading to 
the result a» of the nth event is 


(28a) 

So, as n —> oo 
(28b) 


Pn(d i, «n) = 1-1 


EX ."" 1 E *.(«lV.(«n) 

»-l «i,a n 


^n(«l , a«) 


\l/L(.oti)<f> L (a n ) 


($ L .1)(1.^ L )‘ 

c) When there exists no knowledge concerning the result of the first event, the 
probability of the nth event yielding the result <x n is 

(29) Pn(<Xn) = 2 Pn(«l, a„) ~ ^I,(a„)/(1-^L). 

«1 

In chains of sufficient length for (25) to be valid, the probability of 

F(<*i ,*•*,«„) 

having a value between £ and £ + h has an especially simple asymptotic form. 
From (6) this probability is (if for a given n we let T = art) 

ni/n /j \ 

—»'«(£—A i) 


(30) 


G(f + h) - G(f) = A lim f (^) , 

2m a—*oo J-anW2 \ W / 


(1 - e~*" k ) exp <! — A 2 - 


and from (25) and (5) 
(31) 
if 


A m ~ n lim d m log A L ,z/dx m = nL m 

x—*0 


3! 


+ 


(32) 

L m = lim d w log A L , x /dx m . 

x-*0 


Letting y = om\ 

(30) becomes 


(?te +») - 
(33) 

Oft) ~ -U Hm f ^ 

Z7TZ- o-*oo •!—a y 

{e~ iwi - |i 

Uy'i , \ 
6n» ^ / 

where 



(34a) 

Mi = ft — Ai)/n* 

M 2 = (£ + A — Ai)/n* 


(31b) 

Ai = average value of F(ai, • • • , a n ) = F. 




MARKOFF CHAINS 


Integrating (33) 

(35) Oft + h) - Oft) ~ ( - 2 ^j JT + 0(l/»)] <fe. 

As n —> oo and /i —> 0 

(35a) G(i + h) - G({) ~ -gJL-j, exp (-*)[{ - fl/nW, 

and the probability that £ < F < £ + h becomes Gaussian. 

b. Examples of a simple chain. As an example of a simple Markoff chain let 
us consider an event which can lead to either of two possible results, say “ —1” 
or “1”. Further, let us suppose that the probability of a given result being 
followed by an identical one is p and by one of another type is (1 — p); that is, 

p( —1, -i) = p( 1, 1) = P 

p(-l, 1) = p(l, -1) = 1 - p. 

This chain would be encountered in an analysis of a sequence of tosses of a 
coin with a “memory” so that the probability of two successive tosses showing 
the same face of the coin would be p and that of showing opposite faces (1 — p). 

A question one might ask concerning such a chain is—What is the probability 
of the occurrence of a given number of transitions from one kind of result to 
another? In the chain of results 


- 1 , - 1 , - 1 , 1 , 1 , - 1 , 1 , - 1 , - 1 , -1 

there would be four transitions, one corresponding to each —1 followed by a 1 
and to each 1 followed by a — L The function giving the number of transitions 
in a sequence of n events is 

n—1 

(36) F(a x , • • • , a n ) = 2 > «»+1) 

»-i 

where 

M-l, -1) = h( 1, 1) = 0 
M-1, 1) = Ml, -i) = I- 

Even though the a’s are dependent, in this special case, h(a ty a,+i) and 
/*(««+ 1 , «i+a) are independent so that (40) could have been obtained on this basis. 

To apply the methods described in the beginning of this section we must find 
the characteristic values and vectors of the matrix 


(37) 


/ P (1 - P)A 

\(1 - p)e x p ) 


(the configuration index a has the value either — 1 or 1 in this case instead of 



26 


ELLIOTT W. MONTROLL 


“1” and “2” as given in (17)). The characteristic values are the roots of the 
equation 

v - x (i - vV _ 

(1 - p)e* p - X 

that is, 

(38) \i, x = v + (1 - P)e x 

I X,. t | = | p - (1 - p)e* | < X,.* 
and the characteristic vectors are 

<h.x = Q and (^). 

The \p and vectors have the same components in this case because of the sym 
metry of the P x matrix. Clearly 

X& = Al = Xi.o =: 1J X 2 = ^2,0 “ 2 p 1 

^i(a) = 2“* and \[/ 2 (a) = — a • 2~*. 

From (26) we see that if the result of the first event in the chain is <*i, and 
that of the nth event is a n , the probability of the /cth event yielding the result 
<** is 

f(2p — 1 Oil Ot/c + 1][1 + (2 p — l) n ~*aAr«n] 

2[1 + (2 V - l)*- l ai «n] 

As k y n 1 and (r? — k) simultaneously get very large, P n (a k ) ~ independently 
of a k . 

The probability of an initial result aj leading to a final result a n is (from 28a) 
P«(«i , «n) = (!){! + (2 p - l)”" 1 ajar,} 

so that 

Pn( 1 , 1 ) = P.(- 1 , “I) = (*) {1 + ( 2 V ~ I)”' 1 } 

p n (-i, i) = p n ( i, -l) = a) u - (2p - lr 1 }. 

Now, to answer our original question regarding the probability distribution 
of the transition function (36) 

n—1 

(39) ‘ F(a !, • • • , a„) = £ *(«•, a*+ 1 ), 

*-i 

we use the expression for Z n (z) determined from (24) 

(39) Zn(x) = 2[p + (1 - p)e*r l 



MARKOFF CHAINS 


27 


From (9) the probability of there being between £ and £ + h transitions in a 
sequence of n + 1 events is 


(40) 


GO + h) - <?({) = ± £ 6*“ { (1 - «•'“«){? + (1 - p)e’T du>/w 


_ i r / -,«{_ -<««-»)>. v n| (! - p) p n 

2 ’tzi (n-k)\k\ * 

Letting x = oj/i/2 and rearranging 

0(i + »-««)-;§ o (i + ? (I + «), 

where D(X) is the Dirichlet integral 


D(x)= l_ r^cosXx dx = i 

"K J—tc X 


0 if 1X I > 1 

IXI = 1 
1 |A|<1. 


We therefore have, when [£ + h] < n 
(41) 


*-[{+U (w — k)\kl 


Here [a;] denotes the greatest integer not exceeding x. The sum is zero if 
[£ + h] < [f + !]• When [£ + h] > n 


(42) 


cd+w - os) -1 - ii,~ y . 

k\\n — k) I 


When n is large it is difficult to get a clear picture of the function (j(£) from 
(41) and (42), so we shall develop asymptotic results for large n by using (6) 
instead of (9). 

By employing (5), we see that (this section will be developed on the basis of 
n + 1 trials instead of n) 


Ai = F = n(l — p) 


A 2 = np{ 1 - p) 

A s = np{ 1 — p)(2p — 1) etc. 


Therefore, from (6) 


AG = GO + h) - GO) = A J [ 


oo —»«(£—A i) 


(1 - e~ M ) 


exp [—\np( 1 — p)co 2 — inp(l — p)(2p — l)« 8 /6 — • • •] dm. 



28 


ELLIOTT W. MONTROLL 


Letting u — am*, we have 

__ ^ f iu(E-f-h—Ap/nij 

2?Tl J— oo U 

»©] 




kiyC'-'-'b 


ip( 1 - ?>)(2p - l)u 


6n* 


e -J«‘j>U-jO du 


where 


Mi* = (£ + h — Ai)/n* 

Ms = (f - Ai)//i*. 

Since 

f e ^e +* du = (t/o)* exp (-X 2 /4o) 

J—oo 

we have for large n 


AG 


1 


(43a) 


[2irp(l - p)]i 


/ M2 

^-X2/0p(l-p) 

Hi 

<2 ' , = pt( , -s f ro X ^j) + 0 ©}' ft - 


I, 


As ?i —> oc and /t 


V 2p(l 

0, this becomes 


Gft + A) - Gtt) 


(43b) 


, h exp (- [£ - /'f/2p(l - p)ra| 

[2top( 1 - p)]* 


1 


(2p - OttjlZ.) 

2p(i — p)n 


+ 


»«■ 


A similar problem which occurs in statistics of high polymers can be stated 
abstractly as follows. Suppose there exists a sequence of events each of which 
leads to a translation of length a of a point either to the right or to the left, and 
that the probability of a translation continuing in the same direction as its 
predecessor is p while that of changing its direction is (1 — p). After n trans¬ 
lations what is the probability of a point being displaced a distance £ from its 
origin. 

If “-1” represents a translation to the left and “+1” a translation to the 
right, 


p(-l, -1) = p( 1, 1) = p 

P(-l, l)=p(l, — 1) — (1 — p) 



MARKOFF CHAINS 


29 


The function giving the distance of the point from its origin after n displacements 
is (when a = =fcl) 

n 

F(<*i , • • • , a n ) = a 22 a,- = Joai + h(on , a *) + ••■ + A(a„-i > “») + aOa„ 

J—1 

where 


Ml, 1) = a, M-l, -1) = -a 
Ml,-1) - M-l, 1) =0. 

Neglecting the terms aai/2 and aotj 2 in F(a i, • • • , a n ), one can answer questions 
concerning this problem by evaluating Z n (x) as defined by (15). In this case 
F x has the form 


F x = 




Its characteristic roots are 

Xi,» = p cosh ax + [p 2 cosh 2 ax + (1 — 2p)] 1 = X L ,® 

I X 2l * | = | p cosh az — [p 2 cosh 2 or + (1 — 2p)]* | < \ Xt x . 


and its characteristic vectors: 


= kp - D 2 + (pc” - xi) 2 r* 


l^s.x = [(p — l) 2 + (pe°* — X 2 ) 2 ] * 



Since 

P = Ai = lim 5 log Z n (x)/dx, 

x-*0 

one can show in the present problem that P = 0. Therefore, the probability 
of the translated point being a distance between £ and £ + h from the origin 
after (n + 1) translations, is, as n —> <*> and h —»0 

F(f + A) - F(|) ~ 


where La is by (32): 

La = lim d 2 log \L,x/dx = a 2 p/(l — p). 

*-0 

Thus, 

F({ + A) - F© ~ A[a 2 2mp/(1 - 

When p = 2/3 this problem is equivalent to the determination of the proba- 



30 


ELLIOTT W. MONTROLL 


bility distribution of the components in an arbitrary direction of the distance 
between the ends of a linear polymer. In this case 

F(( + h) - F{$) ~ A(4aVn) H exp (~£ J /W) 

a result obtained by Tobolsky [12] after a lengthy and complicated combinatory 
calculation. 

Another type of simple chain is encountered in the determination of the 
“life span” of a particle which is displaced a unit distance to the right or left 
per unit time along a straight line until it collides with an absorbing boundary 
either — (q + 1) or (p + 1) units from the starting point. This problem has 
been analyzed by M. Kac using the methods discussed in the present paper. 
We shall generalize his results to include the effect of an attraction of the particle 
toward one end of the line so that displacements toward that end are more 
probable than those in the other direction. 

Following the notation of Kac [13] we let X, represent the jth displacement, 
my its length, and 8(m) the probability of a given displacement having the 
length m. Then, 

s if m = 1 

8(m) = 1— 5 if m = — l 

0 otherwise. 


If N represents the life span of a particle, the probability of its exceeding n is 
Prob {N > n] = Prob {— q < Xi < p, — q < Xi + X 2 < p, • • • , 

—q < Xi + Xi + • • • + X n < p] = 2 8 (mi) 8 (rri 2 ) • • • 6(m n ) 

where the summation extends over all integers mi, m 2 , • • • , m n such that 

—q < rrti < p, — q < mi + m* < p, • * • , — q < mi + m 2 + • • • + m„ < p. 

Defining the new set of variables 

ay = q + m + m* + * • * + mj (j = 1, 2, • • • n) 

we see that 

P+q 

Prob {N > n\ = X) 5(«i — q)S(a 2 — «i) • • • 8(a n — a n -i). 

As before, if we introduce the P matrix (of p + q + 1 rows and columns) 


P = (8(a - 0)) 


0 1 - s 0 0 

5 0 1 — 5 0 

0 5 0 1 —5 




MARKOFF CHAINS 


31 


we obtain after applying the equivalent of ( 22 ) 

Prob J N > n\ = £ t >(“»)• 

J —1 £**—0 

Where X/ is the jth. characteristic value of P, and ^/ and <p } are its associated 
characteristic vectors as defined by (19) and (20) (here the range of a starts 
from 0 instead of 1 as in (17) and (19)). 

It is easy to show that the characteristic values of P are 

= 2 [s(l - s)]*cos f, 0 * = 1 , 2 , • • • , p + q + 1 ) 

where 


ii — Ki/iv + 5 + 2 ) 

and that the components of the characteristic vectors are 

1 PM = [2/(P + 9 + 2)]*[s/(l - s)] ia sin (a + l)f, (a = 0, 1, • • • , p + q) 

and 

<p,(a) = [2/(p + q + 2) j*[(l - s)/s]*“ sin (a + l)f,. 

Since 


n _ V2 (1 - s) {1 - 1(-1)W1 - sinf, 

fao Vp + q+ 2 1 - 2[«(r - a)J» cos f, 


we finally have 
Prob { N > n] 


p + q + 2 

? y ‘ 1 11 ~ (— iy(a/l - 3 )* (p+<l+a } cos" sin f, sin (q + l)f, 
fa 1 — 2\/s(l — s) cos f, 


When s = J this reduces to the result of Kac: (* means summation is only over 
event’s 


Prob {N > n) 


n p+0+1 

— r ——5 a * cos” f, sin (2 + l)f, cot if,. 
p + q + 2,_i 


4. Simple Chains with Restrictions. Often when studying chains of dependent 
events, certain functions averaged over the entire chains are known to be 
restricted between definite limits. That is, there might exist k functions 
, « 2 , * * * , oc n ) such that 

(44) — A G } < Gj — g } (ai , • • • , a») < AG/, 0* = 1, 2, • • • k), 

where the G/s and AG/s are preassigned constants. To calculate averages of 
other functions ( 1 ) is no longer valid, for it is an unrestricted sum over all sets 



32 


ELLIOTT W. MONTROLL 


of a’s, including those incompatible with (44). All unrestricted sums in this 
formula (and other similar ones) must be replaced by sums over only those 
sets of a’s compatible with (44). Since it is sometimes more difficult to evaluate 
restricted sums than unrestricted ones, we shall apply an idea of Markoff [1] 
to the reduction of the former to the latter type. 

Let us seek an explicit expression for a function P*(ai , « 2 , • * * , «„) which 
has the property: 

P*(a i , • • * , <*n) = P n (cti , • • • , a„) when a’s are chosen 

so that (44) is satis¬ 
fied of all j. 

0 otherwise. 

Since the Dirichlet integrals 

1 [” sin (p/AG/) v , 

Sj - - --- L exp (ipj 7;) dpi 

7T J— oo Pj 

have the property 

8j = 1 when — AG/ < yj < AG/ 

0 otherwise, 

P»(<*1 , • • • , «n) = 5l$2 * * * 8kPn(0Cl , * * ' , «n) 
has the required character provided 

7/ == G/ — 0/(ai , • • • , a„). 

The average value of a function F{ai , • • • , a») can be written in terms of the 
unrestricted sum 

P = 23 ^(«l , • • * , ®»)P«(®1 , • * * , <*»)/ 23 P*(«l , • • * , <*„), 

!«•} !«-} 

where the summation extends over the complete set of {a # }’s 

{a*} — (ai, a 2 , • • • , a n ). 


As in the case of chains without auxiliary restrictions, a useful function is 

Zn(x) = 23 P!(ai, • • • , a„) exp {xF(ai , • • • , a n ) J 

{««] 

<4o) = ± [’ • • • f -».(*. pi , • • •, p*) n (PmAGm) dp m ) 

n J-oo *-oo m—1 ( Pm J 

where 

Sn(x, PI, • • • , Pk) '= 23 ^»(«1 > *••,««) 


(«.} 


exp jzF(a,, • • • , a„) - i ]£ , • • • , a n )j. 



MARKOFF CHAINS 


33 


When F(ai ,•••,«„) and {gj(a i, • • • , a n ) I are all additive or multiplicative 
functions of the form (13a) and (13b), say 

n—1 

, • • • , a n ) = 2 H<*k , <**+i) 


9i(<*i> •*•»«») = 53 9j(<Xk , <*jfc+i) 


and the probability chain is a simple one, Z„(x) reduces to a simple form. 
Suppose 

n —1 

Pn(&l > * * * y Otn) = 5 1 p(#j > **j+l) 

7-1 

then following the derivation of (24), we have 


(46) 


S»(x, pi, •• • , p„) = Z {Xi.x.pI n_1 (4 > ,,i,.■ 1)(1 •4 r , lI ,p) 


where Xj, x , P , and are characteristic values and vectors of the matrix 

1, 1) • • • Px.p(l, v)\ 


Px, = 


KPx.pi”, i ) 


Vx,„(.v, v)/ 


and 


Px.,(a, P) = p(oc, P) exp \xh(a, P) - i Y. P,!7,(«, /3) j. 


Substitution of (46) into (45) allows one to calculate Z n (x). 


6. More Complicated Chains. In a chain of N events in which the result of 
each event depends on those of its n predecessors (n < < N ), the calculation of 
Z n (x) proceeds in essentially the same manner as in the case of a simple chain. 
Let the N events be divided into N/n sets of “grand events” of n simple events 
each (for simplicity we assume N is divisible by n, this can easily be avoided). 
Thus, if each simple event could lead to any one of v possible results, a grand 
event could lead to anyone of v n possible results and a complicated chain becomes 
a simple chain of grand events with the result of each grand event depending on 
the preceeding grand event. Quantitative calculations thus proceed formally in 
the same manner as in a simple chain. 

6. Continuous Case. In this section we generalize, by studying an example, 
to the case in which each event in a simple chain may lead to any one of a con¬ 
tinuum of results. The example is a problem arising in statistical mechanics of 
molecular chains. 

Consider a linear chain of n identical molecules whose centers of mass remain 
at a set of fixed regularly spaced positions, but which may rotate about their 




34 


ELLIOTT W. MONTROLL 


centers of mass in a plane. Suppose, that the potential energy of interaction 
between neighboring pairs of molecules is a function of the angles a specified 
axis of the molecules makes with the line connection the centers of mass of the 
molecules; that is, the potential energy of interaction between pairs of adjacent 
molecules can be written as V(dj, 6 j+l ). Assuming that forces are sufficiently 
short ranged for interaction between more distant neighbors can be neglected, 
Boltzmann's theorem states that the probability of the axis of the first molecule 
making an angle between 6i and 6\ + ad i with the line of centers of the chain, 
the second between 0 2 and 0 2 + ad 2 and the nth between 0 n and B n + dd n is 
proportional to 

exp [—kT {V(8 \, 6 2 ) + V(0 2 , 0 2 ) + • • • + F(0 n _1 , 0 n ))] dd\ • • • dd n 


where k is Boltzmann's constant and T is the absolute temperature. The 
contribution of the interaction to the thermodynamic properties of the chain 
can be derived from the partition function 


(47) 


n 2v *2r 

"'I 


exp {~jT [m ' 


Oi) + ••• + F( 0 n- 1 , 0 n)i>d$i---d$n. 


For example, the internal energy is 

E = a log Z n /d'(-l/kT) 


and the specific heat is c = dE/ST. 

It is to be noted that Z„ is exactly the integral of the iterated kernel of the 
integral equation 

(48) = l exp j- ~ Vie,, 0 2 )| de 2 . 


If V( 6 i , 62 ) is symmetrical in Oi and d 2 , this linear homogeneous integral equation 
has a set of orthonormal characteristic functions {^/0) i such that 


(49) 


r 2r 

Jo 


dO = 8 jk • 


To each of these characteristic functions there corresponds a characteristic value 
Xy. Now it is well known that the kernel of (48) can be expanded as a series in 
its characteristic functions 

exp j- Vie,, *)J = z A, UeOUOl). 

Introduction of this expression into (47) and applying the orthogonality Condi* 
tions (49) one obtains 

Zn = Z a?- 1 {/%,(*) dej. 


(47a) 



MARKOFF CHAINS 


35 


Probably the most interesting example of a molecular chain of the type 
described above is a chain of magnetic dipoles which are restricted to rotate only 
in a plane. In that case 

2 

V(dj,e j+ 1 ) = ~[cos (0/ - 0,-+i) - 3 cos 0y cos 0,+i]. 

Where ii is the magnetic moment of each dipole and r is the distance between a 
pair of adjacent centers of mass. This potential function leads to the integral 
equation 

A*(0i) = jf 2 ) exp j- [cos (0 X - 0 2 ) - 3 cos 0i cos 0 2 ]| dd 2 . 

Since this equation is rather complicated to solve, we shall devote the rest of the 
section to a potential function of less physical interest, but which leads to a less 
formidable integral equation. 

In studying hindered rotation of molecules, one sometimes uses potential 
functions of the form: 


V(0},dj+ 1 ) = -0 COS (0/ - 0y.fi) 
where fi is a constant. With this potential function (48) becomes 

r 2x 

(50) A^(0i) = / *(0 2 ) exp {J cos (0j - 0 2 )| dd 2 

Jo 

where 


J = 0 /kT. 

The characteristic functions and characteristic values of (50) are easily found 
with the aid of the Fourier Series for exp (J cos 0): 

00 

(51) exp (J cos 0) = h{J) + 2 23 Im(J) cos m 0 


where I m (J) is the with Bessel function of imaginary argument: 

(h Jf k+m 


IM) = 23 


S=o (m + k)lk\ 


From (51) 

00 

exp [7 cos (0i - 0 2 )] = h(J) + 2 23 I m (J) (cos m$ t cos md 2 + sin wi0i sin »i0 2 ). 

mail 

Substituting this expression into (50) we have 

X*(0i) = f ^(0 2 ) |f 0 (7) + 2 23 /»(</)(cos mdi cos m0 2 + sin wt0 x sin wt0 2 ) j d0 2 . 



36 


ELLIOTT W. MONTROLL 


Because of the orthogonality of the trigonometric functions, one can verify by 
direct substitution that the characteristic functions are 

Me) = l/(2r)» 

M\e) = T - * sin md; ^ = ir -1 cos m6, (m — 1, 2, • • •) 
and the corresponding characteristic values are 

Ao = 

\i l) = X< 2) = 2rI m (J) m > 0. 

Introduction of these characteristic functions and values into (47a) we obtain 
the simple formula for the partition function: 

Zn = 2tt[2ttIq(J) } n_1 . 

The internal energy of the molecular chain is therefore 
E = d log Z n /d( — l/kT) 

= -Pin - 1 ) Ii(J)/IoV), 

and the specific heat is: 

C - »tw - mn - iw{i + £$ - »[^J} • 

REFERENCES 

[IJ A. Makkoff, Wahr8cheinlichkeitsrechnung, Leipzig, 1912. 

[2] S. Bernstein, “Sur l’extension du thdoreme limite du oaleul des probabilitds aux 
sommes de quantit6s dependantcs,” Math. Ann., Vol. 97 (1927), p. 1. 

[31 M. Frech^T, Recherces Theorctiques Moderns sur La Theorie des Probabilites , Vol. 2, 
Paris (1937). 

[4] II. Kramers and G. Wannier, “Statistics of the two-dimensional ferromagnet:Part I,” 

Phys. Rev., Vol. 60, (1941), p. 252. 

[5] E. Montroll, “Statistical mechanics of nearest neighbor systems, M Jour. Chem. Phys., 

Vol. 9 (1941), p. 708; Vol. 10 (1942), p. 61. 

[6J E. Lassettre and J. IIowe, “Thermodynamic properties of binary solid solutions on 
the basis of the nearest neighbor approximation,” Jour. Chem. Phys., Vol. 9 
(1941), p. 747. 

[71 J. Ashkin and W. E. Lamb, “The propagation of order in crystal lattices,’* Phys. Rev., 
Vol. 64 (1943), p. 159. 

[81 L. Onsaoer, “Crystal statistics I. A two-dimensional model with an order-disorder 
transition,” Phys. Rev., Vol. 65, (1944), p. 117. 

[9J M. Hostinsky, Methodes generates du Calculu des Probabilites, Paris, 1931. 

[101 H. Cramer, Random Variables and Probability Distributions, Cambridge Univ. Press, 
1937, Chap. 4. 

[Ill G. Frobenius, “Uber Matrizen aus positiven Elementen. II.,” Preuss. Acad. Wiss. 
Silz., (1909), p. 514. 

[121 A. Tobolsky, Powell and H. Eyring, an article in Chemistry of Large Molecules , 
Interscience Publishers, 1943, pp. 156, 182. 

[131 M. Kac, “Random walk in the presence of absorbing barriers,” Annals of Math. Stat. 
Vol. 14, (1945), p. 62. 



ON THE FIRST TWO MOMENTS OF THE MEASURE OF A 
RANDOM SET 

By L. A. Santal6 

Universidad National del Litoral , Argentina 

1. Introduction. In a recent paper [3] H. E. Robbins derived general formulas 
for the moments of the measure of any random set X, and applied the formulas 
to find the mean and the variance of a random sum of intervals on a line. In 
subsequent papers, J. Bronovvski and J. Neyman [1], using other methods, found 
the variance when X is a random sum of rectangles in the plane, and H. E. 
Robbins [4] found the variance when X is a random sum of n-dimensional 
intervals in n-dimensional euclidean space. In the latter paper Robbins 
solved also the corresponding problem for circles on the plane. 

Using the methods of Robbins, our purpose in the present paper is to solve the 
following similar problems: 

(i) Let R denote the rectangle consisting of all points (x,y) such that 0 < x < Aj , 
0 < y < A 2 , and let R' denote the larger rectangle for which — 5 < x < Aj -{- 5, 
— 5 < y < A 2 + d. Let p denote a rectangle of fixed dimensions, a X b, but 
variable position in the plane. The position of p will be determined by the 
coordinates x, y of its center P and the angle <p between the side of length a and 
the .r-axis. We suppose (a 2 + b 2 )* < min (A i , A 2 , 8). Let a fixed number N of 
rectangles p be chosen independently with the probability density function for 
the coordinates (x, y, <p) of each rectangle constant and equal to J x R' in the 
three-dimensional interval with base /?' and height tt and zero outside this 
interval. In section 3 we evaluate the first two moments of the measure of X 9 
where X denotes the intersection of the set-theoretical sum of the N rectangles 
p with R. 

(ii) Let R denote the ?i-dimensional interval consisting of all points On , x 2 , 
x 3 , • • • , x n ) such that 0 < x, < A ,, (i = 1, 2, • • • , n), and let R' denote the 
larger interval for which — 5 < x, < A t + 8. Let a fixed number N of n-dimen¬ 
sional spheres with radii r (such that 2r < min (A t , 2 5)) be chosen independently, 
with the probability density function for the centre of each n-sphere constant 
and equal to 1/R' in R ' and zero outside this interval. Denoting by X the 
intersection of the set theoretical sum of the N n-spheres with R, we evaluate 
in section 4 the first two moments of the measure of X . This problem is a 
generalization to n-dimensional space of the case considered by Robbins for the 
plane (n = 2) in [4]. 

2. Preliminary formulas. Let K be an indeformable plane convex figure of 
variable position in the plane. The position of K may be determined by the 
coordinates (x, y) of a point P fixed within K and the angle <p which measures 
the rotation of K about P. We shall call x, y, <p the coordinates of K . The 

37 



38 


L. A. SANTALd 


measure of a set of figures congruent with K is defined as being the integral of the 
differential form 

(2.1) dK = dxdydtp . 

It is readily shown that this measure does not depend on the particular point P 
chosen to determined the position of 2C[5]. For instance, the measure of the 
set of figures K y each of which contains in its interior a fixed point Q , has the 
value 2 *F, where F denotes the area of K ; that is, 

(2.2) f dK = 2ttF. 

jQtK 

Let Pi and P 2 be two fixed points and let Z be the distance PiP 2 . The measure 
of the set of figures congruent with K , each of which contains both points Pi 
and P 2 in its interior, will be a function of K and Z, say y(K, Z). If d is the 
diameter of K , that is, the maximal distance between two points of K, we have 
lx(K, Z) = 0 for Z > d. 

Examples. Let K be a rectangle p of fixed dimensions a X 6, and let us 
suppose a < b. The diameter d of p is d = (a 2 + b 2 )*. Let P(x , y) be the 
centre of p and <p the angle which forms the side of length b with the segment 
line PiP 2 of length Z. If we keep first constant, then in order that there exist 
positions of p in which it contains the segment line PiP 2 in its interior it is neces¬ 
sary that 

a — Z sin (p > 0, b — l cos <p > 0 

and in this case the area covered by the centres P in all these positions has the 
value 


(a — Z sin <p) (b — l cos <p ). 

Integrating over all permissible values of we obtain 


(2.3) m(p, l) - 4 / 

•'ai 

where we define 


>arcBin(a/l]i 


arccos[6/Z]i 


(a — Z sin <p)(b — Z cos «p) d<p 


Mi = 


x if x < 1 


1 if x > 1. 

Carrying out the obvious integration in (2.3) we have 

f 2 irab - 4 l(a + b) + 2 Z 2 for l < a <b 

' 4 (ab arc sin ( a/l) — i a 2 — bl + b(l 2 — a 2 )*) 

(2.4) p(p, Z) = { for a < l < b 

| 4(a6 arc sin (a/l) — arc cos (b/l) + 6(Z 2 — a 2 )* 

. + a(Z 2 - 6 2 ) 1 - i(a 2 + b 2 ) - i l 2 ) for a < b < l. 



MEASURE OF A RANDOM SET 


39 


As another example, let P be the rectangle consisting of all points ( x , y) such 
that 0 <x<Ai,0<i/<A 2 and let R' be the rectangle consisting of all points 
( x , y) such that 

-3 < * < + 5, -5 < y < A 2 + 5, (a 2 + b 2 )* < min (Ai , A 2 , 5). 

Let us consider the set of rectangles p whose centers belong to R ' and do not 
contain either Pi or P 2 , Pi and P 2 being two fixed points which belong to R . 
Let l be the distance PiP 2 . According to (2.2) and the definition of p(p, Z) 
the measure of the set of rectangles p under consideration is 

(2.5) 2 t rR' - 2.2 t rp + M (p, 0, 

where P' = (Ai + 2 6) (A 2 + 2 5) and p = ab. 

Let K be a plane convex figure of fixed position in its plane. Let us suppose 
K to be translated a distance 7 in the direction 0, and let F(lfm, l , 0) be the area 
of the intersection of K with the translated figure. Obviously if d is the diameter 
of K, F(K , Z, 0) = 0 for l > d. In what follows w*e shall consider the function 

f 2r 

(2.6) <f>(P, l) = P(/v, 7, 0) rZ0. 

Jo 

Example. Let if be a rectangle P of sides Ai, A 2 . Let the symbol [x], as 
in [1], be defined by 

x if x > 0 

M = 

0 if x < 0. 

It is then readily seen that 

(2.7) P(P, 7, 0) = [A i — 7 sin 0] [A 2 — 7 cos 0], 

For our purpose the case in which 7 < min {A \, A 2 ) is of interest. In this case, 
carrying out the immediate integrations, we obtain 

(2.8) <*>(/?, D = 2 ir AxA t - 4 Z(A, + A 2 ) + 2 Z 2 . 

Let (Sft.r be an n-dimensional sphere of radius r. S n , r will denote also the 
volume of this sphere, that is, as is known, (see [2, p. 109]), 

_ fag* 

<2.») " r (" + i)’ 

Let us call the measure of a set of spheres S n .r the measure of the set of their 
centers. That is, if the point P(xi ,x 2 , ■■■ , x„) is the center of <S„, r the measure 
of a set of spheres S n ,r equals the integral extended over the set, of the differential 
form 


(2.10) 


dP = dxidxt • • • dx n . 



40 


L. A. SANTAL6 


For instance, the measure of the set of spheres 5n,r, each of which contains a 
fixed point Q in its interior, has the value 

(2.11) [ dP = £ n , r 
where $ n , r is given by (2.9). 

The measure n(S n , r , 0 of the set of spheres , each of which contains 
totally in its interior a segment of length 1(1 < 2 r), equals the volume of the 
intersection of two-spheres S n ,r whose centers are placed at the end points of the 
given segment. That is, n(S n , r , l) equals twice the volume of the spherical 
segment of an n-sphere of radius r and semiangle a = arc cos (l/2r). We will 
represent the volume of this spherical segment by $„, r (a) and it maybe calculated 
in the following way: The intersection of the n-sphere with a hyperplane at a 
distance x from the center is an (n — 1)-dimensional sphere of radius r' = 
(r 2 — x 2 )K Let & n _i. r / denote the volume of this (n — 1)-dimensional sphere 
(given by the general formula (2.9)). The volume of the spherical segment,, 
whose base has the radius h — r cos a, will be 

$n,r(&) — S n —l,r' dx. 

Putting x = r cos 0 and substituting for *S n __ 1(r , the expression given in (2.9), 
we obtain 

1)12 r n n a 

¥r 

Consequently we can write 

(2.12) /*(£„,r , 0 = 2 Sn,r(«) « 2 rSn-l.r f” tin* 6 dQ , 

Jo 

where S n - i, r is the volume of the (n — 1)-dimensional sphere of radius r and 
a = arc cos (l/2r). 

In (2.12) we may substitute 


S n ,r( a ) — 


sin n 0 dd 


= rS^.rf 

Jo 


sin" 0 d6. 


r* »■ 

Jo 


ddJd = 


- l)(n - 

3) • • • 3.1 

cos 

i (l/2r\ 

n(n — 2) • 

..4.2 ar ° 



j2 \ (n-l)/2 

_L 

(n - 1) 

2r \n \ 

1 ~ 4 fy 


n(n — 2) 

4- ... -L 

(n — 1 )(n — 

3) 

* ’ " 3-1 ( : 

i i 

n(n — 2)- 


4.2 \ 


j2 \ (n-3)/2 

4ry 


(2.13) 



MEASURE OP A RANDOM SET 


41 


for n even, and 


(2.14) 


fsin n < 

Jo 


dd 


(n - l)(w - 3) • • • 4.2 
n(n — 2) • • • 3 


_l ri/ ey*-™ 

2r\n\ 4 r*/ 


+ re ~ 1 A _ j!Y n ,,/2 4- ... + 6* - *>(" - 3) • • • 4.2 ~1 
n(n - 2) \ 4r 2 / n(n - 2) ... 5.3 / 


for n odd. 

In particular, for n = 2,3 we have 


(2.15) m (&,r ,0 = 4r 2 f sin 2 0 d0 = 2 r 2 arc cos (Z/ 2 r) — ^ Z( 4 r 2 — Z 2 )* 

jo 2 

(2.10) n(Ss t r,l) = 2xr 3 f sin 0 d 0 = 5 xr 3 — xr 2 Z + — xZ 3 . 

jo 3 12 


We shall now generalize the formula (2.8) to n-space. 

A direction in n-space may be given by the corresponding point on the surface 
of the n-dimensional sphere of unit radius, that is, by the end point of the radius 
which is parallel to the given direction. The parametric equations of the 

n 

n-sphere ( 2 = 1 are 

1 

|l = COS (Pi 

£2 = sin (pi cos (P 2 

(2.17) ? 3 = sin pi sin cos <p 3 


£ n —1 — sin (pi sin (p% • • • sin (p n —2 cos (p n —i 

£« = sin (pi sin (p% • • • sin (p n -2 sin <p n -i , 

where 0 < (pi < x for i < n — 1 and 0 < ^n-i < 2 x. The element of area of 
this n-sphere has the value (see, [2, p. 109]) 

(2.18) da = sin"” 2 (pi sin n “ 3 & • • • sin (p n -z d<pid(p 2 • • • d ^ n -1 • 

A direction in ?t-dimensional space may then be given by the n — 1 parameters 

¥>1 ) < P2 , • * * > < Pn -1 • 

Given the 71 -dimensional interval R consisting of all points (x x , x 2 , x z , • • • , x n ) 
such that 0 < x,- < A< (t = 1, 2, 3, • • • ft), and suppose that R is translated a 
distance Z(Z < min (Ai, A 2 , A 3 , • • • , A„)) in the direction (?i , #>n-i), 

the intersection of the translated interval with R is a new interval whose volume 

n 

has the value H (A t - — #»), where Xi = Zf, (£* given by (2.17)), 




42 


L. A. SANTAL6 


Our purpose is to evaluate the integral 


(2.19) 


HR, l) = [ ri (A, - Xi) da 

J *n 1 


extended over the surface E n of the n-dimensional sphere of radius unity. We 
shall denote by E m either the surface of the ra-dimensional sphere of radius unity 
or its area, given, as is known [2, p. 110] by 


( 2 . 20 ) 


E m = 


27T m/2 


Because of the symmetry, the coefficients of all the products Ai x Ai t A it • • • A lfl _ fc 
have the same value 


OLk 



X&2 — Xk da. 


The integral extended over the whole surface E n equals 2 n times the integral 
extended over the portion for which > 0. Hence, taking into account (2.17) 
and (2.18) we get 

/• t /2 «»/2 

(Xk — (—l) fc 2 k l k E n ~k I ••• / sin n+ * cos <p± sin 71 f * 5 (p 2 COS<p 2 
Jo Jo 

(2.21) • • • sin” k x <pk cos <pk cUp\ cUpi • • • dipt 


= , _ 2 k l k E n -k _ 

^ J (n + k - 2)(n + fc - 4) ... (n + fc - 2k) 

£or fc = 1, 2, • • • , n — 1. For & = ft we find that 

J /»*7 2 /»*72 

I ••• I sin 2n ~Vi cos<^i 

0 Jo 


( 2 . 22 ) 


• • • sin s^r. i cos ip„-i d<pi dtp* • ■ • dtp.- 1 


= (- 1 )" 


2T 

(2n - 2)(2n - 4) 


• 4.2 - 


Hence, we have the following general formula 


HR,l) =A 1 A i ---A n E n + (-!)" 


2T 

(2 n - 2)(2» - 4) ... 4.2 


+ £ (-D*( Z 

2 k l k En-k 

(n + k- 2)(n + fc - 4) • • • (n + ft - 2ft) * 


(2.23) 



MEASURE OF A RANDOM SET 


43 


In particular, for n = 2 this result coincides with (2.8). For n = 3 we have 


(2.24) 


$(fj, l) = 4wA 1 A 1 A a - l* - 2jtZ(A 1 A 2 + AiAj + A 2 A,) 

+ f f(Ai + A 2 + A,). 


3. First problem. We can now solve the first problem (i) stated in the intro¬ 
duction. Denoting by the same letters either sets or their measures, we consider, 
as in [1] and [4], the set Y of points of R that do not belong to X. We have 
identically: 

(3.1) X + Y = R. 

The general method of Robbins [3] taking into account (2.2), gives immediately 
the first moments 

(3.2) E(X) = R (l — E(X) = R ji — (i _ J.yj, 

where R = .A 1 ^ 2 , R' = (Ai + 28) (A 2 + 25), p = ab. 

Our remaining problem is that of evaluating the second moment of X . Let 
x», Vi , (pi (i = 1,2,3, • • • , N) be the coordinates of the N rectangles p (section 2) 
and let us put, as in (2.1), dp { = dx l dytf<p i . Let P(x , y) and P 0 (x 0 , y 0 ) be two 
points which belong to R and let us put dP = dx dy, dP 0 = dxodyo • Let us 
consider the following multiple integral 


(3.3) 


/ dP dPo dpi dp 2 • • • dpN 
(2ti -R') n 


extended over the sets of rectangles p t - (congruent with p) such that x,-, yi belongs 
to R'j 0 < <pi < 2tt, and do not contain either P or P 0 . That is, the domain of 
integration of J is defined by 


(3.4) 


— 8 < Xi < A\ + 8, — 8 < yi < A 2 + 8, 0 < <pi > 2w , 

P € R, P 0 e R, Pi Pi , P 0 4 pi , (i = 1, 2, • • • , N). 


In order to calculate J, we can first keep the rectangles p,- fixed; the points P 
and P 0 can then vary independently over the set of points Y. That gives 

<m> ^ 

(xi.Vi)ERt 

We can now reverse the order of integration, an operation which is obviously 
justified in this case. Keeping P and P 0 fixed, we can vary each rectangle p< 
over the set of positions in which it does not contain either P or P 0 ; letting l 
denote the distance PPo, we have, according to (2.5), 

(3.6) J = / (l ~ ^ P ~^ P,l) ) N dPdPo. 

PtR,PQtR 



44 


L. A. SANTAL6 


In order to evaluate this integral we divide it into two parts J = Ji + J% » 
according &s0 < l < dor d < l < D, where d = (a 2 + b 2 )* and D = (A 2 + A2) . 
In the interval 0 < L < d we introduce the new variables of integration l. 6 
related to x, y, x 0 , y 0 by 

(3.7) x 0 = x + / cos 0, y 0 = y + l sin 0 

whence 


d(x, y, xp, y Q ) = l 

d(x y y, /, 0) 

In terms of the new variables we have 


In this integral the point P can vary over the intersection of R with the figure 
obtained by translating R a distance / in the direction 0; that is, the integration 
of dP gives the function F(R , Z, 0) defined in section 2. According to (2.6) we 
therefore have 


(3.8) j, - }‘ (1 - in 2r ^’ l) )" Hlt.Dl .11, 

where p(p, Z) is given by (2.4) and $(R, l) by (2.8). 

In order to evaluate ,/ 2 we observe that in the interval d < l < D p(p, /) = 0 
and we have 


=(i - *y 1 dp dp 0 =(1 - r y dp dp 0 ~ j dp dp, 

d<.ld> ^ j 0 ' I'd 


Further we have 


(3.9) f dP dP 0 = R 2 

0 

and with the change of variables (3.7) and the formula (2.8) we find that 

(3.10) f dP dP 0 = f d HR, 1)1 dl = ttAxAx d 2 - t {Ax + .t 2 ) (f + I d\ 

J J 0 o 2 

0 <,lZd 


Collecting (3.8), (3.9), (3.10) and taking into account (3.5) we have 
E(Y 2 ) = Y (i - 2 T Y’ l> y * {R ’ 1)1 dl 


+ (1 - {R 2 ~ *AxA 2 d 2 + UAx + A 2 )d* - m, 


(3.11) 



MEASURE OF A RANDOM SET 


45 


where p — ab, R — A\A 2 , R ' = (Ai + 25) (A 2 + 25), p(p, l) is given by (2.4) and 
l) by (2.8). 

For the variance of X and of Y , we have by (3.1) and (3.2) 

<£ = ft(X 2 ) - E\X) = E(Y 2 ) - E\Y) 

(-I)" 

/ \ 2iV 

• {ft 2 - Aid 2 + 4(4^)^ - hA - R 1 (l - 


which completes the solution of our first problem stated in the introduction. 

4. Second problem. In order to solve the second problem (ii) stated in the 
introduction we will follow the same method of the preceding section. 

Let X be the intersection of the set theoretical sum of the N n-dimensional 
spheres S n , r of radius r with the n-interval R. Let us call Y the set of those points 
of R that do not belong to X , that is, 


(4.1) 


X + Y = R. 


The general method of Robbins gives immediately 

(4.2) E{Y) = ft (l - py , E(X) = ft {l = (l - *7)"} 

n n 

where R «= H A % , R' = JJ (.4 t -f 25), and S n , r is given by (2.9). 

1 1 

We now proceed to calculate E{Y 2 ). For this purpose letQi(t/i , y\ , • • • >y\) 
and Qiiyl , yl , • • ■ , yV) be two points which belong to R and P x (x [, x 2 , • • • y x\) 
be the centers of the N spheres S n , r . Let us put 

(4.3) dQ t = dyldyl • • • di/ n , (i = 1,2), dl\ = dx[dx l 2 • • • dx \, (t = 1,2, • • •, N). 


Consider the integral 


f dQ\dQ 2 dP\dP 2 ••• dpN 
(4.4) J = J —-- 


extended over the domain defined by 


Q l « ft, Qt « ft, ft, e ft', Qift, > r, Q 2 P, > r, (* = 1, 2, • • • , iV). 

If wc keep fti, ft 2 , Ps, • • • , ft* fixed, each point Qi, Q 2 can vary independently 
over the set Y ; consequently we have 


(4.5) 


j.t 

Jp t tR' ft 


On the other hand, if we keep Q. and & fixed, the integral of each dP t gives 



46 


L. A. SANTAL6 


R' - 2 S n ,r + M(>Sn,r, 0 where fj>(Sn,r , 0 is given by (2.12) and i = QxQ 2 . 
Hence we have 


(4.6) 


J = f (l - 2^^ 

JQi*R,Q2<R \ it / 

In order to calculate this integral we split it into two parts J = Ji + J 2 , 

n 

according as 0 < 2 < 2r or 2r < l < D, where D = (22 A*)*. In the interval 

1 

0 < l < 2r we introduce the new variables of integration l, <pi, <P 2 , , <Pn -1 

related to , 2/2 , * * * 2 /n, 2/1 , 2/2 , • • • , 2/1 by 


(4.7) y\ = 2/' + 

where £» is given in (2.17). It is found that 


d(?/l j 2/2 j » 2/» > 2/1 » 2/2 » y Vn) in—1 • n—2 _ 3 


(i = 1, 2, • • • ,n), 

in—l » n— 2 » n—3 • 

= l sm sin <p 2 • • • sin ^> n -2 • 


5(2/’, yl, ■■■, yi, 1,<pu • • • (p„-0 
Hence we have, 

(4.8) dQidQt = T -1 dQydadl, 

where da denotes the element of area of the n-dimensional sphere of unit radius, 
given by (2.18). The same method used in section 3 gives 

(4.9) Ji = jT (l - 2Sn ’ r ~ J ^ Sn ’ r ’ l) y HR, or -1 dl, 

where 4>(i?, 1) is given by (2.23). 

In the interval 2 r < l < D n(S n , r , l) = 0 and we have 


(4.10) 




If dQidQi — f dQidQi\ 

{JOHIZD JOZlSir J 


Now we have 
(4.11) 


1 




dQi dQi = 


and with the change of variables (4.7) we readily find that 
(4.12) • f dQt dQi = I"' HR, Or -1 dl. 

Jo<il£2r Jo 

Collecting (4.9), (4.10), (4.11), (4.12) and taking into account (4.5) and 
(2.23) we have 



MEASURE OF A RANDOM SET 


47 


(4.13) 


E(Y*) = j C (1 - ^ r ~ - p s '- r ’ l) y *(«, or- 1 d 


2 ,n r ln 


2n(2n — 2) • • • 4.2 


* _1 *!.**.•■ 


2 n+a E n -k r n+k 
\n + lc)(n + k - 2) •. • (n + k - 2k) 

n n 

where R = I[ A,, R' = IRA ,• + 25); S, , r is given by (2.9), E m by (2.20), n{S n , l) 

by (2.12) and <£(#, Z) by (2.23). In particular, for n — 2, we obtain the value 
given by Robbins [3, (30)], by use of (2.8), (2.15) and the equations S 2 , n = ttt 2 , 
Ei = 2. For n = 3, the case of ordinary space it follows from (2.16), (2.24) 
and the equations S z , r = | rr\ E B = 4 tt, E 2 = 2 tt, that 


F(F : 


*>=ro- 


167rr 3 + 12?rr 2 Z — irl' 


12R' 


<)'(** - l 3 - 2w(At A, + AtA> 


+ 


(4.14) 


A 2 As)l + g (.Aj + At + A3)l^j Z 2 dl + ^1 — 


QO 

-— irRr 3 -f- St(A\As + A 2 A§ -f* A 2 A^)r 4 

3 


^ (Ai + A 2 + A 8 )r + -g - r f. 


In this case the exact evaluation is easy if one expands the binomial under the 
sign of the integral and integrates term by term. 

From (4.1) we see that A = E(X 2 ) - E\X) = E{Y 2 ) - E 2 (Y). Thus, 
from (4.2) and (4.13) we obtain immediately the second moment E(X 2 ) and the 
variance a\ of X. 


6. Remark. In the second problem we can substitute the n-intervals R and 
R ' by concentric n-dimensional spheres. The problem may then be stated as 
follows: 

Let S n ,a denote a fixed n-dimensional sphere of radius a and S n , a +* the con¬ 
centric n-dimensional sphere of radius a + 6. S n ,a and & n ,a+6 shall also denote 
the corresponding volumes. Let a fixed number N of n-dimensional spheres 
with radii r (r < min (a, 5)) be chosen independently with the probability density 
function for the center of each S n , r constant and equal to l/$ nt a+$ in Sn,a+ a 
and zero outside this n-sphere. Let X denote the intersection of the set-theo¬ 
retical sum of the N n-spheres with S n , a ; we wish to evaluate the first two 
moments of the measure of X . 



48 


L. A. SANTAL6 


It suffices to observe that in this case we have 


(5.1) MSn.a , Z) = n(S n ,a , l)En = 2a S n -i.aE n f sin" Q dJd 

Jo 

where S n -i, a is the volume of the (n — 1)-dimensional sphere of radius a and 
a = arc cos (l/2a). 

The same method used in section 4 gives 

(5.2) S(n - &, (. - S £-J , - S..{i - (i - £^J }, 

£(F 2 ) = [*(l- o ? Z)r -1 dl 

(5.3) J ° ' ' Sn '° +< 1 

+ (i - {«».. - jfw., or -1 dzj, 

where $(£„,<, , Z) is given by (5.1). 

In particular, for n = 2, by use of (5.1), (2.15) and the indefinite integrals 
J arc cos (Z/2a)Z dl = (£Z 2 — a 2 ) arc cos (Z/2a) — \ Z(4a 2 — Z 2 )* + constant, 

f l\4a- - ZV dZ = -|Z(4a 2 - Z 2 ) ! + Ja 2 Z(4a 2 - Z 2 ) 1 


+ 2a 4 arc sin (l/2a) + constant 


we find that 


E{Y°-) = 2. jf (l - arc C ; ( & a ^ 2 g 2 + * (4f * - *>* (2a 2 arc cos(Z/2a) 

- *Z(4a 2 - Z 2 )*) Z dl + (l - {*« 4 - 2* (2a 2 (2r 2 - a 2 ) arc cos Q 


2 v V V (o + «)*/ l V v 

— 3a 2 r(a 2 — r 2 ) J + ?ra 4 + 2r(a 2 — r 2 ) ! — a 4 arc sin(r/a)^|. 

For n = 3, we have by (5.1) and 2.16) 

mr) - 4xf (i - 1 - 6r ‘ 1 ^+V * )' (W- «’< + A*ftfa 


+ 4?r “ ( Q 4! 3)3 ) |^ afl " J V*aV + 4 iraV - |irr 6 j. 

From (5.2) and (5.3) with the use of the relation a 2 z = E(X 2 ) — I?{X) = 
E(Y 2 ) — E 2 (Y) we obtain immediately the second moment E(X 2 ) and the 
variance cr z of X. 



MEASURE OF A RANDOM SET 


49 


REFERENCES 

Ll J J. Bronowski and J. Neyman, “The variance of the measure of a two-dimensional 
random set,” Annals of Math. Stat., Vol. 16 (1945), pp. 330-341. 

[2] R. Deltheil, Probabilities gbomktriques , Gauthier-Villars, Paris, 1926. 

[3] H. E. Robbins, “On the measure of a random set,” Annals of Math . Stat., Vol. 15 

(1944), pp. 70-74. 

[4] H. E. Robbins, “On the measure of a random set, II.” Annals of Math. Stat. 9 Vol. 16 

(1945), pp. 342-347. 

[5] L. AS Santal 6, “Sobre la medida cinem&tica en el piano”, Abhandlungen aus dem 

Mathematisches Seminar der Hamburgische Universitdt , Vol. 11 (1936), pp. 
222-236. 



ON A TEST OF WHETHER ONE OF TWO RANDOM VARIABLES 
IS STOCHASTICALLY LARGER THAN THE OTHER 

By H. B. Mann and D. R. Whitney 
Ohio State University 

1. Summary. Let x and y be two random variables with continuous cumulative 
distribution functions / and g. A statistic U depending on the relative ranks 
of the x’s and y’s is proposed for testing the hypothesis/ = g. Wilcoxon proposed 
an equivalent test in the Biometrics Bulletin ) December, 1945, but gave only a 
few points of the distribution of his statistic. 

Under the hypothesis/ = g the probability of obtaining a given U in a sample 
of n x’s and m y’s is the solution of a certain recurrence relation involving n 
and m. Using this recurrence relation tables have been computed giving the 
probability of U for samples up to n = m = 8. At this point the distribution is 
almost normal. 

From the recurrence relation explicit expressions for the mean, variance, and 
fourth moment are obtained. The 2rth moment is shown to have a certain 
form which enabled us to prove that the limit distribution is normal if m, n go to 
infinity in any arbitrary manner. 

The test is shown to be consistent with respect to the class of alternatives 
f(x) > g(x) for every x. 

2. Introduction. Let x and y be two random variables having continuous 
cumulative distribution functions / and g respectively. The variable x will be 
called stochastically smaller than y if /(a) > g{a) for every a. We wish to test 
the hypothesis / = g against the alternative that x is stochastically smaller than 
y. Such alternatives are of great importance in testing, for instance, the effect 
of treatments on some measurement. One may think of x as the values of 
certain measurements in the control group and of y as the values of the same 
measurement in a group which received treatment. In a particular instance 
the protective effect against infection by certain bacteria was investigated. 
Two groups of rats were used in the experiment. The first group receiving no 
treatment, the second group receiving the drug. Both groups were then infected 
with supposedly equally diluted cultures of the bacteria under investigation. 
Most of the rats in both groups died, but the time of survival was measured and 
it was desired to test whether the drug had the effect of prolonging the life of the 
rats. It was desired to make inferences from the effect on rats to the effect the 
drug would have on humans. Thus, the only relevant alternative to the hy¬ 
pothesis that survival times are not influenced by the drug is that the survival 
time of those rats which received treatment is stochastically larger than that 
of the control group. 


60 



A TEST 


51 


3. The U test. Let the quantities x\ , • • • , x n , yi , • • • , y m be arranged in 
order. This arrangement is unique with probability 1 if P(xi = y/) = 0 and 
this follows from our assumption of continuity. Let U count the number of 
times a y precedes an x. If P(U < U) = a under the null hypothesis, the 
test will be considered significant on the significance level a if U < V and the 
hypothesis of identical distributions of x and y will be rejected. 

This test was first proposed by Wilcoxon [1]. His statistic T is the sum of the 
ranks of the y f s in the ordered sequence of x’s and y’s. In general 


U = mn + 


m(m + 1) 
2 


- T 


and this gives a simple way of computing U. Wilcoxon, however, treated only 
the case m — n and in this case he tabulated only 3 points of the distribution of 
T. Since the test seems of great utility it seemed worthwhile to compute the 
variance, the moments and the limit distribution of U and to investigate the 
class of alternatives with respect to which the test is consistent. 

Although this paper is written in terms of U and the probabilities of U are 
tabulated the results can be easily interpreted in terms of T if so desired. 


4. The distribution of U. Consider now ordered sequences of n x’s and 
m y' s. Since it is only the relation between x and y that matters we replace 
each x by a 0 and each y by a 1. Let U count the number of times a 1 precedes a 
0. Let p n m(U) be the number of sequences of n 0’s and mV s in each of which a 1 
precedes a 0 U times. By examining a sequence with the last term omitted we 
arrive at the recurrence relation: 

pnm(U ) = pn—lm(U 7ft) “f" pnm—\(,U '), 

where pa(U) =0 if U < 0 and p i0 (£/), poi(U) are zero or one according 
as U 7 * 0 or U = 0. 

Under the null hypothesis each of the (m + n ) \/m \n ! sequences of n 0’s and 
m Ts is equally likely. Consequently if p nm (U) represents the probability of a 
sequence in which a 1 precedes a 0 U times then 

( 1 ) „„„<(/) - p—m. 

Using the recurrence relation (1) the probabilities p nm ( O have been tabulated 
for m < n < 8 (see Table I). For m = n = 8 the distribution of U — 4(nm + 1) 
differs only a negligible amount from the normal distribution. We shall, in the 
following, derive the mean, the variance, and the fourth moment of Z7, and 
prove that the limit distribution of U is normal if n and m both approach infinity 
in any arbitrary manner. 

It is obvious that p n m(U) = p m n{U). 

Since the probability of the ith 1 preceding the jth 0 is 4, we have 

(2) E nm (U) = nm/ 2. 



52 


H. B. MANN AM* ' 


\ i'll 


TABLE i 

Probability of Obtaining a U not Larger than that Tabulated in Comparing Samples of 

n and m 


n = 3 


l \. 

• \ 

l 

2 

2 

0 

.250 

.100 

.050 

1 

.500 

.200 

.100 

2 

.750 

.400 

.200 

3 


.600 

.350 

4 



.500 

5 



.650 


n = 4 


\ 

\ m 
U \ 

\ 

1 

2 

3 

4 

0 

.200 

.067 

.028 

.014 

1 

.400 

.133 

.057 

.029 

2 

.600 

.267 

.114 

.057 

3 


.400 

.200 

.100 

4 


.600 

.314 

.171 

5 



.429 

.243 

6 



.571 

.343 

7 




.443 

8 


1 


.557 


n = 5 


•/ 

/ 

l 

2 

3 

4 

5 

0 

.167 

.017 

.018 

.008 

.004 

1 

.333 

.095 

.036 

.016 

.008 

2 

.500 

.190 

.071 

.032 

.016 

3 

.667 

.286 

.125 

.056 

.028 

4 


.429 

.196 

.095 

.048 

5 


.571 

.286 

.143 

.075 

6 



.393 

.206 

.111 

7 



.500 

.278 

.155 

8 



.607 

.365 

.210 

9 




.452 

.274 

10 




.548 

.345 

11 





.421 

! 12 





.500 

13 





.579 


n = 6 


;/ 
/ ° 

l 

2 

3 

4 

5 

6 

0 

.143 

.036 

012 

.005 

.002 

.001 

1 

.286 

.071 

.024 

.010 

.004 

.002 

2 

.428 

.143 

.048 

.019 

.009 

.004 

3 

.571 

.214 

.083 

.033 

.015 

.008 

\ 


.321 

.131 

.057 

.026 

.013 

5 


.429 

.190 

.086 

.041 

.021 

6 


.571 

.274 

.129 

.063 

032 

7 



.357 

.176 

.089 

.047 

8 



.452 

.238 

.123 

.066 

9 



.548 

.305 

.165 

.090 

10 




.381 

.214 

.120 

11 




.457 

.268 

.155 

12 




| .545 

.331 

.197 

13 





.396 

.242 

14 





.465 

.294 

15 





.535 

.350 

16 






.409 

17 






.469 

18 






.531 




A TEST 


53 


TABLE I ( Continued) 


n - 7 


ss 


■ 

B 

n 

5 

6 

7 

■■ 


1 

I 

mm 

.001 

.001 

.000 




S9 


.003 

.001 

.001 

2 

.375 

.111 

.033 

.012 

.005 

.002 

.001 

3 

.500 

.167 

.058 

.021 

.009 

.004 

.002 

4 

.625 

.250 

.092 

.036 

.015 

.007 

.003 

5 


.333 

.133 

.055 

.024 

.011 

.006 

6 


.444 

.192 

.082 

.037 

.017 

.009 

7 . 


.556 

.258 

.115 

.053 

.026 

.013 

8 



.333 

. 15 X 

.074 

.037 

.019 

9 



.417 

.206 

.101 

.051 

.027 

10 



.500 

.264 

. 13 * 

069 

.036 

11 



.583 

.324 

.172 

.090 

.049 

12 




.394 

.216 

.117 

.061 

13 




464 

.265 

.147 

0 S 2 

14 




.338 

.319 

.183 

.104 

15 




' 

.378 

.223 

.130 

16 

1 




.438 

.267 

.159 

17 


1 

1 


1 -500 

.314 

.191 

18 


i 

i 

i 


1 .562 

.365 ! 

.228 

19 



1 

1 



.418 

.267 

20 



1 

1 


.473 

.310 

21 




j 


.527 

.355 

22 



1 

1 

1 

I 

I ! 


.402 

23 



i 


1 

| 


.451 

21 



i 

i 




.500 

25 





! 


.549 


64 


H. B. MANN AND D. R. WHITNEY 


TABLE I 0 Continued) 
n =* 8 



l 

2 

3 

4 

5 

6 

7 

8 

t 

normal 

0 

.111 

.022 

.006 

.002 

.001 

.000 

.000 

.000 

3.308 

.001 

1 

.222 

.044 

.012 

.004 

.002 

.001 

.000 

.000 

3.203 

.001 

2 

.333 

.089 

.024 

.008 

.003 

.001 

.001 

.000 

3.098 

.001 

3 

.444 

.133 

.042 

.014 

.005 

.002 

.001 

.001 

2.993 

.001 

4 

.556 

.200 

.067 

.024 

.009 

.004 

.002 

.001 

2.888 

.002 

5 


.267 

.097 

.036 

.015 

.006 

.003 

.001 

2.783 

.003 

6 


.356 

.139 

.055 

.023 

.010 

.005 

.002 

2.678 

.004 

7 


.444 

.188 

.077 

.033 

.015 

.007 

.003 

2.573 

.005 

8 


.556 

.248 

.107 

.047 

.021 

.010 

.005 

2.468 

.007 

9 



.315 

.141 

.064 

.030 

.014 

.007 

2.363 

.009 

10 



.387 

.184 

.085 

.041 

.020 

.010 

2.258 

.012 

11 



.461 

.230 

.111 

.054 

.027 

.014 

2.153 

.016 

12 



.539 

.286 

.142 

.071 

.036 

.019 

2.048 

.020 

13 




.341 

.177 

.091 

.047 

.025 

1.943 

.026 

14 




.404 

.217 

.114 

.060 

.032 

1.838 

.033 

15 




.467 

.262 

.141 

.076 

.041 

1.733 

.041 

16 




.533 

.311 

.172 

.095 

.052 

1.628 

.052 

17 





.362 

.207 

.116 

.065 

1.523 

.064 

18 





.416 

.245 

.140 

.080 

1.418 

.078 

19 





.472 

.286 

.168 

.097 

1.313 

.094 

20 





.528 

.331 

.198 

.117 

1.208 

.113 

21 






.377 

.232 

.139 

1.102 

.135 

22 






.426 

.268 

.164 

.998 

.159 

23 






.475 

.306 

.191 

.893 

.185 

24 






.525 

.347 

.221 

.788 

.215 

25 




i 



.389 

.253 

.683 

.247 

26 







.433 

.287 

.578 

.282 

27 







.478 

.323 

.473 

.318 

28 







.522 

.360 

.368 

.356 

29 








.399 

.263 

.396 

30 








.439 

.158 

.437 

31 








.480 

.052 

.481 

32 








.520 





A TEST 


55 


We now seek an expression for E nm (u 2 ) where u — U — nm/2. After multiply¬ 
ing (1) by (U - nm/2) 2 , using 

Enm(u 2 ) = ]£([/ - nm/2)*p nm (U) 

u 

and expanding: 

(3) Enm(u ) = — E n ~ lm (u 2 ) + —^— Enm-i(u 2 ) + nm/4, 

ft t- m. n + m 

where E nm (u) denotes the expectation of (U — nm/2) in sequences with n 0’s 
and m l’s. The initial conditions of (3) are seen by direct calculation to be 

(4) Eno(u 2 ) = E 0m (u 2 ) - 0. 

By substitution E nm (u 2 ) = nm(n + m + 1)/12 is a solution of the recurrence 
relation (3) and its initial conditions (4). Hence, it follows by mathematical 
induction that 

(5) E nm (u 2 ) = nm(n + m + 1)/12. 

The fourth moment is similarly a solution of the recurrence relation 

(6) E nm (u*) = E n „ lm (u) + — E nm ^(u 4 ) 

n + m n + m 

+ ^ (2 rim + 2 nm 2 —n — m —nm) 

which is obtained from (1) by multiplication by (U — nm/2 ) 4 and expansion. 
The initial conditions of (6) are found by direct calculation to be 

(7) Enoiu 4 ) = E 0m (u) = 0. 

It may be verified that 

(8) E nm (u A ) = nm ^ n — (5 nm + 5 nm 2 — 2 n 2 — 2m + 3 nm — 2n — 2m) 

satisfies the recurrence relation (6) and its initial conditions (7) and hence (8) 
follows by mathematical induction. 

To investigate the limit distribution of u as n, m become infinite we investigate 
the rth moment. Following the same procedure as in the case of the second and 
fourth moments and using the symmetry of the distribution to find the odd 
moments zero we get the following recurrence relation. 

(9) EUu'l - -l-t (ol)i («»***-+ mn^E^u™) | 

n -f- m a -0 \*OL/4: a 

For r = 1, 2 it is known that E n m{u lr ) is a polynomial in n and m of degree 3r 
and that it is divisible by nm{n + m + 1). Assuming that E nm (u 2a ), a < r 
is a polynomial in n and m of degree 3a divisible by nm{n + m + 1) we will 



56 


H. B. MANN AND D. R. WHITNEY 


show that it is possible to find a polynomial of degree 3 r in n and m divisible by 
nra(n + m + 1) which satisfies the recurrence relation (9) for E nm (,u 2r ) and 
also its initial conditions, namely, E n o(u 2r ) = E 0m (u 2r ) = 0. 

The last condition is trivially satisfied if E nm (u 2r ) is divisible by nm{n + m + 1). 
Our method here is to actually substitute a polynomial with undetermined 
coefficients into (9) and show that the coefficients can be obtained uniquely. 
Rearranging (9) we obtain 


EnniU*) - E n - lm (u 2t ) ~ — E nm ^(u ir ) 

n + m n + m 


+ m \2a/4 a 


[nm ?a E n -i m (u r ~ 2a ) + mn 2a E ntn -i(u 2r ~ 2a )) 


Since for X <j we can write E nm {y^) — nm{n + m + l)Pnm 3 where P 3 n x m 8 is 
a polynomial in n, m of degree 3X — 3 the above equation reduces to 


(11) E nm (u*) - - E,-. lm (u 2r ) - - E„ m -i(u 2r ) = nmQ 3 ;-' 

n + m n + m 

where Q 8 n r « 3 is a polynomial in n, m of degree 3r — 3. 

Now let 

3r—3 

E nm (u 2r ) = nm{n + m + 1) 22 auvCmi 1 

i+j £3r—3 

where an = a J{ are to be determined. Substitution in (11) yields: 

E «ij[(n + m + l)n'm’ — (n — 1 )(n — l) 1 'to' — (m - — 1)'] = Q 3 ™ 3 

M 

and rearrangement yields: 

(12) S an |V + E ^ (-l)’-“(n'm“ + n a m y )J = 

•+/ksr-j ’ ^ 

Consider first the terms of degree 3r — 3. In this case i + j = 3r — 3 and 
a = i will give 

3r—3 

E o ( , r -,- i [n < m 3r_3_< + (t + l)(n 3,_!_ 'm’ + n’m 3 ^)] 

i»0 


or 

(13) 3r 22 a <3r _3_i/i < 77i 3r “ 8_t . 

t-0 

Equating the coefficients of these terms of degree 3r — 3 to the corresponding 
ones in Qn r « 3 it is possible to calculate the value of a« r - 3 _<, (i = 0, • • • , 3r — 3). 
We assume now that the aa are known for % + j > 3r — 3 — (k — 1) and 



A TEST 


57 


we will find the value of a, ; where * + j = 3r — 3 — k. Consider then the terms 
in (12) of degree 3r — 3 — k. These terms will occur when 

i + j = 3r — 3, a = i — k\ i + j = 3r — 4, a = i — k + 1; • • • ; 

i + j = 3r — 3 — k, a = i 

All, but the last, contain coefficients which have already been evaluated. The 
last one reduces to 

(3r - 3) lT % k a^r-z-k-i »’ m 3r ~ i ~ k ~ < . 

%wmQ 


Thus by equating coefficients a^r-a-ic-i for i = 0, 1, • 3r — 3 — k can be 

evaluated in terms of the coefficients a t j already known and those in Q 3 n r m 3 . 
This concludes the proof that E nm (u 2r ) is a polynomial in n> m of degree 3r and 
is divisible by nm(n + m + 1). 

We now investigate the coefficients of the terms of degree 3r. For X = 1, 2 


E nm (u*) = 1} 


5-3-1 


12 x 


(nm) x (n + m + l) x + terms of degree < 3X). 


We assume this to hold for X < r and we will show that it holds for X = r. Sub¬ 
stitution reduces the right side of (10) to 

"(2r - 3) ••• 5-3*1 


|” (2r — 3) 


12^-1 


+ mri 


12 r “‘ 


tTXm ~ lY~ l (n + m) r l J 


(n - 1 y-\r\n + m) r_1 J 

+ (terms of degree < 3r) 


or 

r(2r — 1) 


(2r - 3) 


>±i] 


{(« + my~’ [ 

• [n(n - l) r-1 m r+1 + mim - l) r ~V +I ] + (terms of degree < 3r - l)j 
which reduces to 

Z_jj- * * * 5 ‘ 3 * -- (nm) r (n + m ) r_1 + (terms of degree < 3r - 1). 

12 r 

Comparison of coefficients with (13) multiplied by nm gives 

« , Zr- 3 -i (2r- «••• 5-3.1 

nm 2-r Oi 3 ,- 3 -in m = 

i-0 

or 


12r 


(nm) r (n + m) r 


E nm (u 2r ) 


*) = (2r~ 1) — M j (nm) r (n + m + l) r 


(14) 


+ (terms of degree < 3r). 



58 


H. B. MANN AND D. R. WHITNEY 


We now wish to show that E nm (u r ) is at most of degree 2r in n or m. For 
r = 1, 2 this has already been established. Assuming that it is true for lower 
moments the right side of (10), which reduces to nmQ 3 ™ 3 is at most of degree 
2r — 1 in n. We again compare coefficients in (12). First, for terms of degree 
3r — 3 we have already seen that n has degree at most 2r — 2. For terms 
of degree 3r — 4 we use i + j = 3r — 3, a = i — 1 and i + j = 3r — 4, a = i. 
The first case gives rise to no terms in n of degree greater than 2r — 2 so when we 
solve for the coefficients a l3r - 4 ~i the coefficients of terms in n of degree greater 
than 2r — 2 must be zero. The process repeats and we find no terms in n or m 
of degree greater than 2r — 2 in the left side of (12). This gives E nm {u 2r ) at 
most the degree 2r in n or m. 

Now consider the ratio 


I a : 


E nm (u r ) 

[EnmWY 
(2r — 1) • • • 5-3-1 
I2r 


(tim) r (n + m + l) r 


[nm(n + m + l)/12] r 


^_ (terms of degree < 3r; in n or m, < 2r) 


[nm(?i + m + 1)/12]' 
(terms of degree < 3r; in n or m, < 2r) 


( nm) r (n + m + l) r 


= (2r - 1) ••• 5*3*1 + 

Hence 

(15) Lim I = (2r — 1) • • • 5-3-1 


and by a well known theorem it follows from (15) that the limit distribution is 
normal. 


6. Consistency of the U test. If / and g are the cumulative distribution 
functions of the x’s and y’s then our null hypothesis is / = g. The alternatives 
admitted are f(n) > g(a) for every a. Let E A denote the expectation under the 
alternative. 

Defining 

0 if Xi < Vi 

Xij = 

1 if Xi > yj 

E A (Xij) = P(xi > y,) = f g df < i 
J-00 

E A {x liXik ) == P(Xi > 2/y »Xi > 2 /jb) = /* g df < 

J— 00 

E A (x ik Xjk) = P(Xi > y kj Xj > y k ) = f (1 f) dg K. 


we have 



A TEST 


59 


We can now write 

EM = i X, EAixijx*) = J — «i, E A (xnXsk) — £ — €j 

where X, € 1 , e 2 are positive numbers. 

We have then 

ff A (Xij) = \ X <J A (XijXik) = Y3 — €1 + X — X 2 

< T A (XijXki ) = 0 for i 7* k,j I <r A (x ik Xjk) = A — + X — X 2 

Now 

(16) E a (U) = X) E A (Xij) = nm/2 — Xnm 
and 

(17) ai(L/) = S o* A (xij) + 2 <r A (xijX ik ) + 2 ^(z^/*) + 2 ^(x.^jw) 
or 

o\(U) = nm(n + m + 1)/12 

+ nm[- X 2 (n + w — 1) + (X — ei)(m — 1) + (X — €2 )(n — 1)]. 
Let the critical region under the null hypothesis consist of those U ’s satisfying 
nra/2 — U > t n cr where lim t n = £. Then 

n-»oo 

P(nm/ 2 - £/ > /„<r | A) = P(E A (U) - U> k-<r A ) where A: = * n<r — Xwm 

<r A 

and by Tchebycheff’s inequality, since for large values of n,m k < O' 

P(n m /2-U>^ U) > 1 - 

which by (5) and (17) gives 
P(nm/2 — U > t n <r\A) > l 

nm(n + m + 1) + nm[ _ x2(n + m _ 1} + ( X _ „)(„ - 1) + ( X - «)(n - 1) 

(/n V wm(n + m + 1)/12 — Xnm) 2 
> 1 

1 + , 12 r -- . - f~X J (n + to - 1) + (X - ci )(to -!)+(*- c 3 )(n - 1)] 

n -+- tn -f- l ____ 

(<■-> 1 / 31 ^)' 

We obtain then that 

Lim P(nm /2 — U > t n o | A) = 1 

n,m-*oo 

which is the requirement for consistency. 



60 


H. B. MANN AND D. R. WHITNEY 


6. Comparison with other tests. Another test which might seem appropriate 
for the comparison of a control group with a group receiving treatment is the 
test introduced by Wald and Wolfowitz [2]. The test by Wald and Wolfowitz is 
consistent with respect to every alternative g. However in the case considered 
we are only interested in the alternative hypothesis that measurements in the 
group receiving treatment are stochastically larger than in the control group. 
Intuitively, it seems that the test proposed here is more efficient for detecting the 
particular alternative considered than the test proposed by Wald and Wolfowitz. 
This intuitive feeling was borne out by the results of the test in the particular 
experiment described in the introduction. All in all, 62 experiments were 
conducted using various bacteria in different solutions and various amounts of 
the protective drug. The U Test gave 14 significant results on the 5% level 
and 4 on the 1% level. The test of Wald and Wolfowitz gave 7 significant 
results on the 5% level and 2 on the 1% level. A final decision between the two 
tests can, of course, only be arrived at on the basis of their power functions, 
which present formidable difficulties. 

In comparing the two statistics it was noted that a slight dislocation of a 
value may cause a significant change in the number of runs easier than it can 
cause a significant change in the statistic proposed here. For instance, in the 
sequence both statistics would give a probability less than 

.05. If however, the sequence is slightly altered to X\X 2 x^x^XbyiX^y 2 ynjAybys , 
P (number of runs < 4) > .05 while P(U < 1) = .002. 

After completion of the present paper it came to the authors attention that 
the U test had already been proposed by K. K. Mathen [3]. However Mathen’s 
distribution of U is incorrect and its derivation erroneous, since it assumes 
independence of the random variables x %J as defined in section 5 of the present 
paper, while obviously x l} and x tk are not independent. 

REFERENCES 

[1] Frank Wilcoxon, “Individual comparisons by ranking methods”, Biometrics Bull., 

Vol. 1 (1945), pp. 80-83. 

[2] A. Wald and J. Wolfowitz, “On a test whether two samples are from the same popula¬ 

tion ”, Annals of Math. Stat., Vol. 11 ( 1940 ), pp . 147 - 162 . 

[3] K. K. Mathen, Sankhya, 1946, p. 329. 



ON THE CONVERGENCE OF SEQUENCES OF MOMENT 
GENERATING FUNCTIONS 

By W. Kozakiewicz 


University of Saskatchewan 

1. Summary. The purpose of this paper is to give a few theorems con¬ 
cerning the reciprocal relation between the convergence of a sequence of distribu¬ 
tion functions and the convergence of the corresponding sequence of their 
moment generating functions. 

The paper consists of two parts. In the first part the univariate case is 
discussed. The content of this part is closely related to that of a recent paper 
by J. H. Curtiss [1, p. 430-433], but the results are of a somewhat more general 
nature, and the methods of proofs are different and do not make use of the theory 
of a complex variable. The second part deals with the multivariate case which, 
as far as the author knows, has not been treated before with proofs in as com¬ 
plete and rigorous a way. 

In both the univariate and multivariate cases the proofs are based on the well 
known Ilelly selection principle [2, p. 26] for bounded sequences of monotonic 
functions. 

2. The univariate case. Let X be a random variable and F(x) its distribution 
function. That is, for any real x , F(x) = P{X < x }, where P{X < x) denotes 
the probability of the event X < x. The function 

<p(t) = E(e‘ x ) = f + e ,x dF(x), 

J— oo 

in which the integral is taken in the Stieltjes-Riemann sense and is assumed to 
converge in some neighborhood of the origin, is called the moment generating 
function of X (or of F(x)). 

Henceforth we use the abbreviations d.f. and m.g.f. for the terms distribution 
function and moment generating function respectively. The variable t will be 
always real. 

Theorem 1 . Let {F n (x)\ be a sequence of d.f.'s. Let M(x) for any fixed 
non-negative x be the least upper bound of the sequence {F n (—x) + 1 — F«(aO}. 
If the sequence \F n (x )} converges on an everywhere dense set of points on the x-axis, 
and if there exists a positive number a s'uch that for any fixed t in the interval 1 1 1 < a 

(1) lim e uu M{x) = 0, 

*-♦+80 


then: 

(a) there exists a d.f. F(x) such that lim F n (x) = F(x) at each paint of continuity of 

ft”*80 

of F(x); 

(b) the m.g.f.’s of F(x) and F„(x), say <p(t) and #>„(<) exist for \ t \ < a; 

(c) lim <p n {t) — <fi(t) for \t \ < a and uniformly in each interval 1 11 < 0 < a. 



62 


W. KOZAKIEWICZ 


To prove (a), it may be noticed that there exists a function F(x ), non-decreasing 
and continuous on the right, such that lim F n (x) = F(x) at each point of con- 

n—*oo 

continuity of F(x). But F(x) must be a distribution function. Indeed, we 
have for x > 0 


(2) F(-z) + 1 - F(x) < Af(z-). 

Now from ( 1 ), putting t = 0, we find that M(x) and consequently M(x — ) 
approach zero as x —> + ». This proves that F( — ° o) = 0 and F(+ oo) = 1 . 
To prove (b), we notice first that the integral 


<Pn(t) = 


e xt dF n (x) 


(n- 1,2 ,...), 


is convergent for | t | < a. This follows immediately from ( 1 ) by applying the 
method of integration by parts to the integrals 


f 


e xt dF n (x) and 


£ 

J-N 


e xt dF n (x), 


which for any t in the interval 1 1 1 < a will be seen to be bounded for all values of 
N. By the same argument, the relation lim M(x—)e [t[x = 0, |£| < a, which 

Z-*+00 

can be easily deduced from ( 1 ), together with ( 2 ) imply that the integral repre¬ 
senting <p{t) is convergent for | t | < a. 

Let now /3 be a positive number less than a and let 7 be such that < 7 < a. 
Let M y be the least upper bound of M(x)e yx for x > 0. Using the method of 
integration by parts and applying ( 1 ) we have for 1 1 1 < 

e xt dF„(x) = [1 - F n (N)] e m + t T” e*‘[l - F„(x)} dx 

p NW-y) 

< M(N)e N0 + M y U e -- . 

7 P 

We could prove easily that the same inequality is true for the integrals 

* dF n (x), I*” e xt dF(x) y f * e xt dF(x). 

Now let c be any positive number. Because of (3), we have 

(4) [ e Tt dF n (x) < € , [ e xt dF(x) < e, 

for a sufficiently great number N 0 , and uniformly with respect to n and t, when 
\t \ < 0. Clearly, No can be so chosen that F(x) is continuous for x = d= No. 
Then 

Z Nq »No 

e*‘dF n (x) = e‘dF{x), 

Nq Nq 

uniformly for 1 1 1 < 0. 




MOMENT GENERATING FUNCTIONS 


G3 


The relations (4) and (5) prove that (p n {t) —» v(t) as n —> », uniformly for 
| J | < 0 . But 0 can be chosen as near to a as we please; thus (c) is proved. 

Theorem 2. Let {F n (x)} be a sequence of d.f.'s and {^>«(0| the corresponding 
sequence of m.gf.’s. If <p n (t) exists for \t \ < a, and if there exists a finite valued 
function <p(t) defined for | 1 1 < a, such that lim <p n (t) — <p(t) for | 1 1 < a, then 

(a) lim M(x)e ltlx = 0 for \t\ < <*; 

*-♦4-00 

(b) there exists a d.f. F(x) such that lim F n (x) = F(x) at each point of continuity 

n-+oo 

ofF(x) 

(c) the m.g.f. of F{x) exists for \ t \ < a and is identically equal to (p(t) in this interval 

(d) lim <p n (t) = <p{1) uniformly in each interval | 1 1 < P < a. 

n-* oo 

To prove (a), let t be a number in the interval 1 1 1 < a, and let 0 be chosen so 
that 1 1 1 < 0 < a. Then, for x > 0, we have 

Fn(-x) + 1 - F n (x) = [ X dF n (u) + dF n (u) 

J -00 J X 

< e- fix Jff e-*" dF„(u) + e^ x jf e Su dF n (u) 

^ € PX [<Pn( — fi) + <Pn(P)\- 

Consequently 

M(x)e u ' r < e^-^lu.b. {<p n (-p) + <pM }, 

» 

and since the sequences {<p n (—0)} and f^ n (/3)} are convergent, and therefore 
bounded, it follows that M(x)e ulr approaches zero as x —> + <*. 

To prove (b) we may notice that by the Helly selection principle we can 
choose a subsequence {F njL (z)} which is convergent to some non-decreasing 
function F(x) , at each point of continuity of F(x) . Now the Theorem 1 together 
with (a) imply that F(x) is a d.f. and that the limit of the subsequence {#>»*($}, 
namely <p{t) i must be identical, for | ^ | < a, with the m.g.f. of F(x). By the 
uniqueness property of a m.g.f. we know that F(x) is uniquely determined by 
<p(t ), and therefore it follows that every convergent subsequence of {F„(:r)) 
approaches the same limit F(x) at each point of continuity of F(x). This is, 
however, equivalent to the statement that the sequence (F n (x)} itself converges 
to F(x) at each point of continuity of F(x). Thus (b) is proved. We see at 
once that (c) and (d) follow immediately from the Theorem 1 . 

Theorem 2 is of course similar to the Theorem 3 in the paper of Curtiss [ 1 , 
p. 432], The proof of (a), however, is not contained in his paper. From the 
Theorems 1 and 2 there follows immediately 
Theorem 3. Let {F n (s) 1 be a sequence of d.f.’s, and let {^ n (0} he the correspond - 



64 


W. KOZAKIEWICZ 


ing sequence of m.g.f. y s , which are all assumed to exist for 1 1 1 < a. The necessary 
and sufficient conditions for the convergence of [<p n (t) J in the interval | 11 < a, are: 

(a) lim M(x)e U]x = 0, | t | < a 

»—*+°o 

(b) the sequence {F n (x )} converges to ad.f. F(x) at each point of continuity of F(x). 
Further , the m.g.f. of F(x) exists for \t\ < a and is equal in this interval to the limit 
of the sequence [<p n {t )). 

In his paper Curtiss gives an example of a sequence {F n (a;)} of d.f.’s which 
converges to a d.f. F(x) } while the corresponding sequence {p„(0} of m.g.f.’s does 
not converge to the m.g.f. <p(t) of the d.f. F(x), though both ^> n (0> ( n = 1, 2, • • •)> 
and <p(t) exist for all t. It may be easily proved by the direct method that in the 
case considered the condition (a) of the Theorem 3 is not satisfied. 

It is perhaps worth while to notice that the condition (a) of the Theorem 3 may 
be expressed also as follows: 

lim x~ x log M(x) < — a. 

X—*-foo 

3. The multivariate case. For the sake of simplicity we shall consider here 
the bivariate case only. The results obtained in this chapter, can be, however, 
easily extended to the case when d.f.’s and m.g.f.’s are defined in the Euclidean 
space of any finite number of dimensions. 

Let (X \, X 2 ) be a random vector variable in the two-dimensional Euclidean 
space, and let F{x i, x 2 ) be its d.f. That is, for any real numbers Xi and x 2 , 

F(x i, X 2 ) = P{X i < xi , X 2 < x 2 }. 

Let 

F l (x i ) = P{X 1 <x l \ = F(xi , + oo), 

Fi(xi) = P{X 2 < x 2 } = F(+ «>,x 2 ); 

then Fi(x\) and F 2 (x 2 ) are called the marginal d.f.’s of Xi and X 2 respectively. 
The m.gi.’s of the d.f.’s F(x i, Xi), Fi(xi) and F 2 (x 2 ) are defined by the equations: 

v(h,h) = E(e Xlh+Xih ) = ( + “ f + " dF(xi , x 2 ) 

«/_ 00 J— oo 

««.)) = E(e x ‘“) = r* e*'“ dFi(xi), (i = 1, 2), 

J— oo 

in which the integrals are assumed to converge in some neighborhood of the 
origin. It is easy to see that <?i(ti) = <p(ti , 0) and <pi(t 2 ) = <p( 0, 4). 

Theorem 4. Let U ) and ^*(h, U) be the m.g.f.’s of d.f. y s F{x i, x 2 ) and 

F*(x i, x 2 ) respectively. If <p(t \, U) and <p*(h , h) exist and are equal in some 
neighborhood of the origin | L | < a<, (i — 1,2), then F(xi , x 2 ) = F*(x i, Xi) 
identically . 

To prove this theorem, let us introduce two random vector variables (Xi , X 2 ) 



MOMENT GENERATING FUNCTIONS 


65 


and (X? , X*) of which the d.f.’s are respectively F and F*. Consider now two 
random variables 

Z = XA + Xjfc, Z* = xUi + Xtk , 

where k and k denote two real numbers not both zero. If <p{t) and ^*(0 are 
respectively the m.g.f.’s of Z and Z *, we have 

<fi(D = <p(tti,tk), = <p*{tk,tk). 

Consequently <p(t) = (p*{t) provided that \tt % \ < a i} (i — 1 , 2 ). It follows from 
the uniqueness property of the m.g.f. in the univariate case that the d.f.’s of 
Z and Z* must be identical. Now, according to a theorem due to Cramer 
[3, p. 105], if the d.f.’s of Z and Z* coincide for all pairs of values (h , k) such that 
| fi | + | k | 5 ^ 0, the d.f .’s F and F* must be identical. It may be worth while to 
reproduce here Cramer’s proof. Let k) = E(e' {Xltl+Xttt) ) and ^*(h, k) — 
JB ( 6 . ( x}i i +x|i 1 ) ) ^ (characteristic functions of F and F* respectively. 
Then , tk) and , tk) are the characteristic functions of Z and Z * 
respectively. Since Z and Z * have the same d.f.’s, it follows that ^(Mi, tk) — 
\//*(tti , tk) for all values of t. Putting t — 1, we find that \[/(k , k) = , k) 

if | k | + j k | ^ 0. For ti = 1 2 = 0, ^(0, 0) = ^*(0, 0) = 1. Therefore , k) = 
*(k , k) identically, and since the characteristic function uniquely determines 
the d.f., it follows that the d.f. F and F* are identical. 

Theorem 5. Let \F n (xi , x 2 )J be a sequence of d.f.’s. Let Fi n (xi) and F 2 „(x 2 ) 
be respectively the marginal d.f.’s determined by F n (x\ , x 2 ). Let 

Miixi) = l.u.b. {F in (-Xi) + 1 “ F in (xi)} 

n 

1 , 2 ). If there exist positive numbers «i and a 2 such that for 
lim = 0 , (t = 1 , 2 ), 

* t -*+oo 

and if {F„(xi, x 2 )} converges on an everywhere dense set on the ( xi , x 2 ) plane , 
then: 

(a) there exists a d.f. F(x i, x 2 ) such that lim F n (x i, x 2 ) = F(x i, x 2 ) at each point 

n —*oo 

of continuity of F(x i, x 2 ), 

(b) there exist two positive numbers $i and <5 2 , <5, < , such that the m.g.f.*s of 

F{x i, x 2 ) and F n (x i, x 2 ), say <p(h , k) and (p n {k , k), exist for | U | < $,•, (i = 1,2), 

(c) lim (fnik , k) = <p{k , h) for | U | < 5*, and uniformly in each two-dimensional 

n—*oo 

interval | U | < /3* < $,•, (i = 1, 2). 

To prove (a), we notice that there obviously exists a function F{x i, £ 2 ), con- 
tinous on the right with respect to each variable, satisfying the relation 

A 2 F(xi , x 2 ) = F(x" , x 2 ) + F(x [, x 2 ) - F(x[ , a£) - F(x" , x 2 ) > 0 
for Xi < xi , x 2 < x 2 , and such that 


where x< > 0, (i = 
\U\< on 

( 6 ) 



66 


W. KOZAKIEWXCZ 


(7) 


lim F„(xi, z 2 ) = F(xi,x 2 ) 


at each point of continuity of F(h , x 2 ). We shall prove that F(xi , Xt) is a d.f. 
In fact, it is easy to see that we have for x, > 0 , (i = 1,2), 

(8) F(—x i, -Xi) < F(-Xi , xi) < Mi(xi-), F(xi, -x 2 ) < M t (xt-), 

1 - F{xi , X 2 ) < Mi(xi) + Af 2 (x 2 ). 

Now, according to ( 6 ), lim M<(x<—) = lim Af<(x<) = 0, (i = 1, 2), therefore it 

Zi~*+oo Xi~*+9 0 

follows from ( 8 ) that F( — °©, — °o) = F( — qo , £ 2 ) = F(xi, — oo) = 0 and 
F(+ 00 , + 00 ) = l, which proves that F(x 1 , x 2 ) is a d.f. 

To prove (b), let <pi n (td be the m.g.f. of the d.f. F< n (x<), (z = 1 , 2 ). Let 
F\(x 1 ) and F 2 (x 2 ) be the marginal d.f.’s determined by F(x\ , x 2 ) and let ^<(£ 1 ) be 
the m.g.f. of Fi(xy ), (z = 1 , 2 ). 

Now let 2V 7 > N > 0 and 


= r f" e zl> ' +I ' h dF„(xi , x 2 ) - [* l" e z ‘ h+zi,t dF n (Xi, x 2 ) 

•/-W' J-A" J-N J-AT 

n iST /.AT /.AT' fA7' 

+ 11 + / / 

+ f " H e* l ‘ l+x,,i dF n (x \, x 2 ) = /> + J 2 + J s + / 4 . 

J—N* J- AT' 


Applying the Schwartz inequality to 7 X , we find 

«*> s cr /i. ■"'■)' or ^ 

But 

(10) r r e 2x,h dF n (x u x 2 ) < [” e**' l 'dF ln (x l ), 

J N J~N f Jn 


and similarly 

(11) r f e lz,h dF n (x 1 ,x 2 ) < f" e^'dFM. 

Jn J-n’ J-n’ 


Let € be any positive number and 7 * a positive number less than , (z = 1 , 2 ). 
It follows from the proof of the Theorem 1 , taking into account ( 6 ), that the 
integrals representing <pi n (t x ) and #>,(£,), (z = 1 , 2 ), exist and are uniformly con¬ 
vergent with respect to n and U , when | U | < 7 *, (z = 1 , 2 ). Consequently 
we have 


( 12 ) [ dF in (x ( ) < e , f e^dF^) < «, (• = 1 , 2 ), 

uniformly with respect to n and U when \t \ < 7 », (z = 1 , 2 ), provided that 2V is 
sufficiently large, say N > AT 0 . Let us take & = y { /2, (z = 1 , 2 ). The integrals 



MOMENT GENERATING FUNCTIONS 


67 


representing #>,»($,) and <Pi{U ), (i = 1, 2), are obviously uniformly bounded for all 
n and when | U | < y %, (i = 1,2), they are all less than some constant C. Con¬ 
sequently taking into account (9), (10), (11), and (12), we find 

Ii < y/ Ce , 

uniformly with respect to n and ^ when | U | < Pi , (i = 1, 2), provided that 
N f > N > No. Since the same inequality is true for / 2 , 1 < and / 4 , we have 

(13) R n (N, N\ h , 4) < 4\/Ce, 

uniformly with respect to n and ^ , when | | < , (« = 1, 2), provided 2V' > 

AT > 2Vo. Hence the integral representing <p n (ti , k) is uniformly convergent for 
| U | < Pi , and consequently convergent for | U | < a»/2, (t = 1, 2), since Pi 
can be chosen as near to <*</2 as we please. 

Similarly, using (12), we could find 

(14) fl(TV, TV', <i, fe) < 4\/C«, | U | < ft , N' > N > No 

where 

R{N, N', = [" [* e xl ‘ l+I,l, dF(x l , x*) - f" [* , *,). 

This proves, in turn, that the integral representing <p{h , fe) is uniformly con¬ 
vergent for | U | < Pi and convergent for | U | < a,/2, (z = 1, 2). Thus (b) is 
proved with Si = a i/2, (z = 1, 2). 

To prove (c), let N* —> + « and N = N Q in (13) and (14). We obtain 

(15) ft„(iVo ,+ «,<,,<,)< 4*/Ci, R(N 0 ,+ oo, <,,«,) < 4«v/tt 
uniformly with respect to n and U when \ti\ < Pi . 

Clearly, No can be chosen so that Fi(xi) and F 2 (x 2 ) are continuous for X\ = 
x 2 = dbiVo. Then 

/* No fNo r No pN 0 

(16) lim / / dF n (xi, xt) = / / e x " l+I '"dF(x l , *»), 

n—»oo J-jV Q JV 0 **~.V 0 *t-iV 0 

uniformly for |£i | < /3i, (z = 1,2). 

The relations (15) and (16) prove that 

lim ^ n (^i, fo) = ^(fi, ^ 2 ), 

n-*oo 

uniformly for | £,-1 < 0*, (z = 1,2). The ordinary convergence obviously holds 
for 1^ | < «i/2, (z = 1, 2). 

It follows from the above proof, which refers to the bivariate case, that we 
may take $»■ = an/2 , (z = 1,2), in (b) and (c). 

The existence of the corresponding numbers S x , <$,• < , (t = 1, 2, • • • , fc), 

in the fc-variate case can be easily established by the repeated application of the 
Schwartz inequality. 



68 


W. KOZAKIEWICZ 


Theorem 6 . Let <p n (t x , fe), <pin(tt), F n (x x , x^ } F in (x % ) and Mi(x % ) } (i = 1, 2), 
have the same meaning as in the Theorem 5. If <p n {ti , t 2 ) exist for | U | < a<, 
(i = 1,2), and if there exists a finite valued function <p{t x , h) defined for | U | < a*, 
such that lim *p n (ti , £>) = <p(t\ , k), | U | < a;, 

n—»oo 

then 

(u) lim e l,<lx< = 0 /or | << | < a<, (i = 1, 2), 

*»-*+» 

(b) £/iere exists a d.f. F{x\ , .t 2 ), swc/i that lim F n {x i, x 2 ) — F(x i, z 2 ) each pointf 0 / 

n—*oo 

continuity of F(x\, xf), 

(c) <Ae wt.^./. of F(xi, x 2 ) exists for | U | < a,- and is identically equal to <p(ti, k) for 
I t. | < a,, (i =1,2), 

(d) lim <fi„(ti, k) = <p(h , k) uniformly for | U \ < fa < a<, (*' = 1,2). 

n-*ao 

To prove (a), it is sufficient to notice that <pi n (t x ) = <p n (t x , 0) and Wnik) = 
^(0, Consequently we have 

lim ip\ n {t\) = (p(ti , 0), lim (p 2 n{k ) = ^(0, fe), | U | < a %, (i = 1,2). 

n—*oo n-*oo 

Therefore (a) follows immediately from Theorem 2. 

To prove (b), we may notice that according to the Helly principle of selection 
applied to the sequence {F„(:ri, x 2 )}, there exists a subsequence {F nk (x x , a^)}, 
selected from the sequence {F n (x i, x 2 )\ which is convergent to some function 
F(x i, x 2 ) continuous on the right and with non-negative second difference. 
But F(x i, x 2 ) must be a d.f. according to the Theorem 5, since the relation (6) is 
satisfied by the sequence {F njb (:ri, x 2 )}. Moreover, the limit of the sequence 
{^» t (h , k )}, namely <p(t x , U ), when considered in a sufficiently small neighborhood 
of the origin, is the m.g.f. of F(x x , x 2 ). Since the d.f. is uniquely determined by 
its m.g.f., it follows that every convergent subsequence of {F n (x x , x 2 )} con¬ 
verges to the same limit F(x x , x 2 ) at each point of continuity of F(x x , x 2 ). This 
is, however, the same as to say that the sequence {F n {x x , x 2 )} itself converges to 
F(x i, x 2 ) at each point of continuity of F(x x , x 2 ). 

To prove (c), we have to show that the m.g.f. of F(x x , x 2 ), say <p*(t x , fe), exists 
for 1 1{ | < a, and is equal to <p(t x 9 ti) 9 \t% \ < on , (i = 1,2). (We have proved that 
<P*(ti , ti) = <p(ti , < 2 ) only for sufficiently small values of | t x | and | U, |). The 
existence of <p*(t x , k) for | L | < a*, (i = 1, 2), can be easily established by the 
method used by Curtiss [1, p. 433]. Suppose indeed that <p*(t x , k) does not 
exist at some point ( t x , $), where 1 1° | < a t , (i — 1, 2). That means that we 
can find a positive number N such that 

(17) f" [" e‘i' l+, l t ‘dFx,, x s ) > *(£,£) 

J-N J-S 

Since lim F n {x x , xi) — F(x x , £ 2 ) at all points of continuity of F(x 1 , x 2 ), and since 



MOMENT GENERATING FUNCTIONS 


N can be so chosen that the marginal d.f.’s Fi(x{) and F 2 (x 2 ) are continuous for 
Xi — x 2 = dtN, it follows that 

(18) lim (" [” e‘i’ c ' +, l*'dF n (xi,x t ) = f" [” e‘° Il+ ‘^dF(x lt **). 

n—»oo -LjV J— JV J-AT J-AT 

The formulas (17) and (18) give lim <p n (t i , $) > <p(tl , 4), which is impossible 

n—»oo 

because lim <p n (ti , £*) = <p(*i, tf) for | 2; | < a t *, (i = 1,2). 

n—*oo 

To prove that <p{h , U) = ?>*(*i, fe) for | | < a», (t = 1, 2), let (fo, * 2 ) denote 

a fixed point such that | U | < a t ,(i = 1,2). Clearly, <p n (Mi, /^), (n = 1,2, • • •), 
and tp*(tti , tk), considered as functions of the variable t , are m.g.f.*s provided that 
| tti | < an , (t = 1, 2). (See first part of proof of Theorem 4). Now, according 
to Theorem 2, the limit of the sequence {(p n (tt\ , tt 2 ) j, namely <p(tti , tti ), | tti | < a iy 
(i = 1, 2), is also a m.g.f. Since <p(tU , // 2 ) = ^>*(#i, tf 2 ) in a sufficiently small 
interval containing the point t = 0, it follows from the uniqueness property of the 
m.g.f. in the univariate case that <p(tt \, tti) = <P*(tti , tti) identically for | tti | < on , 
(t = 1, 2). Putting t = 1, we find <p(h, U) = ^*(h, k), | U | < a,, (i = 1, 2). 
Thus (c) is completely proved. 

To prove (d), it is sufficient to notice that the sequence {<p n (ti , fe)) is uniformly 
continuous in each two-dimensional interval | U | < 0* < , (i = 1, 2), (that 

is, for any e > 0, there exists a positive number 5 = 5(e) such that 

I <Pn(h , ^) — (pn(tl y ii) I < € 

if 

\ti - t" I < 5, \t< I < 0 t9 I ti I < fit, (i = 1,2), (» - 1,2, ..-)). 

Consequently, the sequence [<p n (ti , h )} which is convergent for | U | < ft , must 
be uniformly convergent if | U | < ft, (i = 1, 2). 

References 

[1] J. H. Curtiss, “A note on the theory of moment generating functions”, Annals of 

Math. Stal., Vol. 13 (1942). 

[2] D. V. Widder, The Laplace Transform , Princeton Univ. Press, 1941. 

[3] H. CRAiriBR, Random Variables and Probability Distributions , Cambridge Tract No. 

36, 1937. 



A GENERALIZATION OF TSHEBYSHEV'S INEQUALITY TO TWO 

DIMENSIONS 

By Z. W. Birnbaum, J. Raymond, and H. S. Zuckerman 
University of Washington 

1. Let Xi , X 2 , • * * , X n be independent random variables with expectations 
E(Xj ) = ej and variances a(Xj) = a) for j = 1 , 2, ••• , n. The question 

may be asked: What is the upper bound for the probability 

that the point {Xi , X 2 , • • • , does not fall inside of the ellipsoid 

- lf 

1-1 

For n = 1 the answer to this question is given by Tshebyshev’s inequality 


J (X - E(x)f ^ ^ a\X 

L * 2 J - t 2 


which can not be improved without further assumptions. By a trivial generali¬ 
zation of the argument leading to (1.1) one can prove the inequality 

tYCX/ — e i ) 2 . A . -A 




for any integer n. This inequality, however, can be improved for n > 2. In 

particular, for n = 2, the following theorem will be proved: 

Theorem 1.1. Let X and Y he independent random variables , with expectations 

E(X) = X 0 , E(Y) = Y 0 and variances a\ , <j\ . Then , for any s > 0, t > 0 
2 2 

such that ~ we have 

<U) >,]<«»,,) 


(1.4) Ms, 0 


1 ilj + 2>l 


Px , o r y _ Qx 

s* 




$ 2 ' t 2 


"h’h isi(j + ^ + 1 /^ + ^) 
? + ¥ ' #<"(7 + r + s I- 


+ — + 
-r p -v 


70 



TSHEBYSHEV’S INEQUALITY 


71 


F or any given o 2 j, oy , 8 > 0 , t > 0 such that ~ there exist independent random 

8 t 

variables X and Y with the variances , 4, such that the equality sign is true in 
(13). 

This theorem is a special case of the more general statement: 

Theorem 1.2. Let W, Z be independent random, variables such that 


(1.4) P{W < 0) = P(Z < 0) = 0, 

(1.5) E(]V) = X, E(Z) = a, 

(1.6) X < y. 


Then, for any t > 0, we have 


(1.7) 

where 


(1.8) M(t) 


P(W + Z >t) < M(l) 

1 if t < X + m 

X + /j_X l — (X + /k) _ n 
t t ' t - X ~ f - X 

if X + yti < £ < J(X + 2/i + \/x 2 + 4/* 2 ) 

if j(\ + 2(t + VX 2 + 4 M 2 ) < t. 


For any given X > 0, n > 0, X < /x, and t > 0, there exist independent variables 
W , Z such that (14) and (1.5) arc fulfilled and that the equality sign is true in (1.7). 
Theorem 1.1 is obtained from Theorem 1.2 by writing 


W = 


(X - X 0 ) 2 


z = 


(X - To ) 2 
l 2 


t = 1. 


2. Before proving Theorem 1.2 we shall derive two lemmas. The first of these 
lemmas deals with more than one variable. Since its proof for general m does not 
present any additional difficulties it will be stated and proven for any number 
m > 1 of variables, although in the proof of Theorem 1.2 it will be used only 
for m = 1. 

Lemma 1. Let U> V \, F 2 , * • • ,V m be independent discrete random variables 
with only non-negative possible values, and let U have a probability distribution 
with the possible values 0 < JJ\ < C/ 2 < • • • < U n and the probabilities P(Ui) = r, 
for i = 1,2, • • • , n. We consider any three possible values U / , Uk , U i of U such 
that 

0 < Uf < U k < U % , 

with the corresponding probabilities r y , r k , r t . Then, for any t > 0, Mere eatfste a 

random variable U' with the same distribution as U except that the probabilities 
rj y r k ,riofUj,Uk,Ui are replaced by rj, r*, n swe/i that 



72 


Z. W. BIRNBAUM, J. RAYMOND, AND H. S. ZUCKERMAN 


(2.1) E(U') = E(U) 

(2.2) one of r'j , r k , r\ is zero 

(2.3) P(U' + 7i + • • • + V m > t) > P(U + Vi + • • • + V m > t). 
Proof: let rj ,r k , n be written 

(2.4) i •} = r y + a0 t r* = r* - 0, r[ = r, + (1 - a)0. 

For any a , 0 we then have 

r} + r* + r{ = ry + r* + rj . 


Choosing 

(2.5) ««(£/,- I7*)/(ffi - ^i) 


we obtain the equality 

+ 17,1-5 = U jT y + XJ k r k + r/,r, 


so that (2.1) is true for any 0. 
We obviously have 


( 2 . 6 ) 


( m \ n 



The variable 17' has the same possible values 17* as the variable U. Writing 
P(U r = Ui) = u , for i = 1, 2, • • • , n, we also have 

(2.7) P (t/' + fj V. > t) = g r',P (±V.>t- U^j . 

From (2.6), (2.7), and (2.4) we obtain 

p (u' + E v. > tj - p (u + g v. > t) 

(2.8) = apP (pV.>t- {/,) - pP (g V. > t - U 

+ (1 - a)pP (g V, > t - 7,). 

For a determined by (2.5), the right-hand side of (2.8) is of the form C/8, and 

will be positive if sign 0 = sign C. If sign C is positive, we choose 0 *= r k and 

have, from (2.4), r k = 0, and, from (2.8), the inequality (2.3). If sign C is 

( r • r i \ / / 

—-, —--) which leads to either ry = 0 or n = 

a 1 — ay 

0, and again to (2.3). In both cases we have kept the probabilities ry, r k , r\ 
non-negative as they should be. 



TSHEBYSHEV’S INEQUALITY 


73 


Lemma 2. Let (he discrete random variable U have only the two non-negative 
values Ui < U 2 , with the corresponding probabilities r \, r 2 , and let the a given 
number such that 

(2.9) E(U) <t<U 2 . 

Then there exists a number a > 0 such that the random variable U f with the possible 
values 

(2.91) Ui = Ux + a 

Ui = t 

and the corresponding probabilities r x , r 2 , has the properties 

(2.92) 0 < Ui < Ui 

(2.93) E(U') = E(U). 

Proof: to have (2.91) and (2.93) it is sufficient to choose 

a = * 2 (^2 ~ t) 

n 

Then (2.92) is also fulfilled since, in view of (2.9), we have 

rr ' n Ui + r 2 U 2 — r 2 1 E{U) — r 2 t . t — r 2 t . Tr t 

UI = -= - S-= t = U 2 , 

n n r x 

and obviously a > 0 and hence Ui > Ui > 0. 

3. Theorem 1 will first be proven under the assumption that W and Z are 
discrete random variables, each with a finite number of non-negative possible 
values. By repeatedly applying Lemma 1 with m = 1, U = W y V x = Z, we 
reduce the number of possible values of W which have non-zero probabilities 
to two, and denote those possible values by W x < W 2 , and their probabilities 
by pi and p 2 = 1 — pi. Then, applying Lemma 1 to the case m = 1, U = Z, 
Vi = W, we similarly reduce the possible values of Z to the two non-negative 
values Z\ < Z 2 , and denote the corresponding probabilities by q x and q 2 = 1 — q \. 
Throughout all these steps the expectations E(W) = X and E{Z) = n remain 
unchanged, and P(W + Z > t) is not decreased. 

For t < X + m, inequality (1.3) is obviously true, and equality is attained for 
W having the only possible value X with probability 1 and Z having the only 
possible value y with probability 1. 

For the remainder of the proof we assume t > X + y. We then have 

£>X + ju^^“t“^i^ Wi + Z \. 

If W 2 > t f we may replace it by W 2 = t according to Lemma 2. Similarly, if 
Z 2 > t , we may replace it by Z 2 = t. The probability P(W + Z > t) is not 
decreased in this process. We may thus assume, without loss of generality, that 

W 2 < t y Z 2 < t. 



74 


Z. W. BIRNBAUM, J. RAYMOND, AND H. S. ZUCKERMAN 


The joint distribution of ( W , Z) has now the possible values represented by the 
four points (Wi , Zi), (Wi , Z 2 ), (W 2 , Zi), (TF 2 , Z 2 ). The coordinates of these 
four points and their probabilities fulfill the following conditions 

(3.1) 0 < TFi < X < W 2 < t; 0 < Zi < m < Z 2 < t 

(3.2) pi + p 2 = Qi + q% = 1 

(3.3) PiWi + p 2 W 2 — X, qiZi + q 2 Z 2 = 

In view of (3.1), the point ( W \, Zi) always lies below the line W + Z = t. The 
other points may or may not lie below that line. Accordingly, we distinguish 
the cases listed in Table I. These clearly include all possible cases since (W% , Z 2 ) 
can not be below the line W + Z = t without all the other points being below 
that line. 

In case V we have P(W + Z > t) = 0. 

For the discussion of the remaining cases we note the following relationships 
which follow from (3.2) and (3.3). 


TABLE I 


Case 

Points below line 

W + Z - t 

Points not below lino 

W + Z - t 

I 

(Wi, zo 

(Wi.Zi), (Wi, Zi), (Wi, Z,) 

II 

(.w, , z»), or,, zi) 

(Wi, Zi), (W,.Zi) 

III 

(Wi, Zi), (W, ,Z,) 

(Wi, Zi), (W,,Zi) 

IV 

(Wi, Zi), (Wi, Zi), (W,, Z,) 

(W,, Z.) 

V 

(W,, Z,), (W,, Zx), (Wi , Z,), (W,, Zi) 

none 


Wi- \ 

X - Wi 

Pl Wi-Wi’ 

Pl Wi - Wi’ 

„ Zi — n 

„ n — Zi 

91 Zi-Zi' 

9t Zi-Zi 


In case I we have 

(3.41) Wi + Zi < t } W 2 + Zi > t , Wi + Z 2 > t, W 2 + Z 2 > t , 

P = P(W + z > t) = p 2 qi + Piq 2 + P 2 q 2 = 1 - ptfi 

j _ W 2 — X Z 2 — m 
IF. - IPi ' Zi - Zi ’ 

Since P is a decreasing function of W\ and Z \, we replace W\ and Zi by the 
smallest values compatible with (3.41), namely W\ = t — Z 2 , Zi = t — W 2 , 
and obtain 


P < 1 - 


(1F 2 - X)(Z 2 - m) 
(IF 2 + Z 2 - 0 2 


= R(W 2f Z 2 ). 



TSHEBYSHEV’S INEQUALITY 


75 


For fixed Z 2 , R(W 2 , Z 2 ) has a minimum at W 2 = Z 2 + 2X — t and no other 
extremum, hence it assumes its maximum at one or both of the end-points of the 
interval for W s which, by (3.1) and (3.41), is 

t - Z x < W 2 < t. 

In view of (3.1) we also have t — < t — Zi, and hence 

P < Max [R(t — /i, Z 2 ), R(t , Z 2 )]. 

We find 


m - m, z,) = i - ^—- < i - * M x - x 


and 


Z 2 — M £ — M t — 

R(t, Z,) = 1 - - — X) lf 2 — = K <U (Z.). 


. Zj 

This last expression has a minimum for Z 2 = 2 and no other extremum, hence it 
assumes its maximum at the ends of the interval for Z 2 which, by (3.41) and 
(3.1), is 

t - Wi < Z 2 < t. 

From (3.1) we also have / — X < t — W\ \ and thus 

m, W t ) < Max [R m (t - X), R w (t)} = Max . 

Finally, we obtain 


< Max f 


X + n 


L t - n’ t - x’ 


_ x M l 

<2 J* 


Each of the values P = --, - ~ can be attained in case I. 

t — n t — X t r 

as is shown by the probability distributions 

Wi = 0, TI 2 = l “ m> Zi = Mj Z 2 = 


(3.42) 


(3.43) 


Pi = 1 - 


;-. P 2 = :-, 

f ~ H t — fJL 

= X, TF 2 = *, Zi = 0, Z 2 - * - X, 

M „ _ M 


0 i = 1 , <72 - 0 ; 


Pi = li P 2 = 0, qi = 1 — 


* - X’ 


02 



•PT, 

= o, 

Wi = <, 

Zj * 0, Z 2 88 £, 

(3.44) 


X 

X 

1 M 


Pi = 1 

t’ 

Pl = 7 , 

01 = 1 - J f 02 


* - X* 


M 

7 # 



76 


Z. W. BIRNBAUM, J. RAYMOND, AND H. S. ZTJCKERMAN 


In case II we have 

W\ + Z\ < t, Wa + Z\ < t 9 

(3.51) 


P = P(W + Z > t) = Pl^ 2 + ?202 = 02 = 


ITi + Z 2 > *, TT 2 + Za > t, 
M — Z\ 


Za — #i 


This is a decreasing function of Z\ as well as of Za and hence takes its'maximum 
for the smallest values of Z\ and Z 2 compatible with (3.1) and (3.5), that is for 
Zi = 0, Za = / — X. We thus obtain 

P < 


“ t - X 

This upper bound can be attained in case II, as may be seen from the distribution 
TVi — X, TV 2 = X, 2/i = 0, Za — t — X, 


(3.52) 


Pi = h P2 = b 0i = 1 - 


02 = 


t - X 7 ~ t - X * 

Case III is symmetrical with case II and leads to the inequality 

p < 

t — n 

In case IV we have 

TV I + Z\ < t, Wa + Z\ < t, W\ + Za < t, Wa + Za > i, 


(3.61) 


P = p(W + Z > t) = p 2 q 2 = 


(X - Wi)(n - Zl) 


(Wa - Wi)(Za - Zi) 9 
The right hand side is a decreasing function of each of the variables Wi , W 2 , 
Z lf Za , and hence is increased by chosing for these variables the smallest values 
compatible with (3.61), i.e. 

(3.62) Wx = Zi = 0, Wa + Z 2 = t 

for which we obtain 

X u 


P < 


~ Wat - Wa 


= P' 2) (iv 2 ). 


* 


Since P W (TF 2 ) has a minimum at W 2 = - and no other extremum, it attains its 

Z 

largest value at one of the end points of the interval for W 2 which, by (3.1), 
(3.61) and (3.62), is 


X < Wa < t- M . 


This leads to 


P < Max S®(* - M )] = Max f -JL , _A_ . 

Lc — X t — 



TSHEBYSHEV’S INEQUALITY 


77 


The upper bounds -- - ■ , —-—, respectively, are attained in case IV for the 

t — A t — [1 

probability distribution 

Wi = 0, W 2 = X, Zi = 0, Z 2 = t — X, 


(3.63) 


and 


Pi = 0, Pi = 1, 9i = 1 - , q 2 = 


TVi — 0, IV 2 — £ — /X, Z\ — 0, Z 2 ~ fMj 

qi = 0, g 2 = 1. 


„ t x X 

Pi = 1 “ ;-, p2 = -, 

t — n t — n 


From the preceding discussion we conclude that P — P(W + Z > t) always 
fulfills the inequality 


< Max — 3 -- , 
J ~ M 


M X + 
t-\' i 


r ~ ] - w 


for t > X + /x- Since we have assumed X < n, we have 


t - fx ~ t 


for 


t > X + and therefore 

C7«) = Max for < > X + „• 


It is easily verified that 

X + 11 \n 


- ~ for X + /x < t < \ (X + 2n + VX 2 + V) 


and 


f&J < - y for i (X + 2m + a/x 2 + v) < t 

so that we have U(t) = M(t) as defined in (1.8). For given X, n and any t > X 

+ /x, the equality P = — is fulfilled for the distributions (3.43), (3.52) and 
t — A 

(3.63), while the equality P — ~ ^ is true for the distribution (3.44). 

This completes the proof of Theorem 1.2 for discrete random variables. If 
W and Z are independent random variables with the cumulative probability 
functions P{W <w) = F{w) and P(Z < z) = G(z) , then each of these cumulative 
probability functions can be uniformly approximated by a step function with a 
finite number of steps, that is by the cumulative probability function of a discrete 
random variable with a finite number of possible values. Since for such variables 
Theorem 1.2 is proven, it also is true for the general random variables W and Z. 



78 


Z. W. BIRNBAUM, J. RAYMOND, AND H. 8. ZUCKERMAN 


4. An attempt to extend the method used in proving Theorem 1.2 to more 
than two variables leads to arguments of a prohibitive length. It is possible, 
however, to obtain corollaries of Theorems 1.1 and 1.2 which lead to an improve¬ 
ment of inequality (1.2) for n variables. 

Corollary 2.1 Let Xi , X 2 , • • • , X n be independent random variables with 
expectations E(Xf) = e } - and variances o- 2 (X,-) = c ). Then , for any tj > 0, 
/ = 1, 2, • • • , n, and any m such that 


m 2 
<7i 


si = E£ < £ 7 = 

7—1 tj j—m+\ tj 


we have the inequality 


p(i 

V-i 


>.)s 


1 if 1 < £ % = Si + 2 2 

7-1 tj 

V ^ — V t "" ( 2l + 2 2 ) 

w1 ’ t - 2! 

if 2. + 2* < < < | [2! + 22 2 + Vr? + 42 2 ] 
E^-2x- 2* if i [2, + 22, + V 2 ; + 42 2 J < <• 

u '- 1 

This corollary is a special case of the following corollary to Theorem 1.2 
Corollary 2.2. Let W \, TF 2 , • • * , W n be independent random variables 
such that P(Wj < 0) = 0 for j = 1, 2, • • • , n, and let m be any integer such that 


£ B(W,) = X, E E(w f ) = M, 

7—1 7—w+1 


X < n. 


Then , /or any t > 0, we have 


' (g < 


M(0 


where M(t) is defined by (1.8). 

This corollary follows immediately from Theorem 1.2 by writing 

w = Ewv, z = E Wj. 

7-1 ;-m+l 

To obtain Corollary 2.1, one only has to write in Corollary 2.2 
• nr, = . 


If some additional assumptions are made on the expectations E(W >) or on the 



TSHEBYSHEV’S INEQUALITY 


79 


variances a ), the upper bounds in Corollaries 2.1 and 2.2 may be minimized by 
proper choice of m or of the tj . For example, if all the variances are equal 

2 2 


2 2 
<Tl == <T2 “ 


— <J n — (J 

and n is even, one obtains the inequality 

1 if t 2 < no 2 




1 - 


4* 2 

t — no 


~ n 2 if na < i‘ < 
1 ~ 2 a 


2 ^ .2 ^ 3 + \^5 _ 2 


rur 


ncr 2 A 1 nA 

- 4 t ) ■' 


3 + \/5 


w<r 2 < < 2 . 



DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT 
IN A CIRCULARLY CORRELATED UNIVERSE 1 

By R. B. Leipnik 

Cowles Commission for Research in Economics 
1. Summary. It is desired to find an approximate distribution of simple 

. . X\Xi -f- • • • 4“ XtXi . . . . 

form for the statistic f = — 2 — - r 2 ’“ (f is an estimate of the serial corre- 

x\ -r * * * + x T 

lation coefficient p in a circular universe) in the case that p 7 * 0 in the universe. 
Such a distribution is obtained by smoothing the joint characteristic function 
of the numerator and denominator of the expression for f. The first two mo¬ 
ments are calculated; from these f is seen to be a consistent estimate of p. A 
graph of this distribution for sample size T = 20 and various values of p is given. 

In addition, an approximate distribution for p — x\ + • • • + x 2 T is derived 
which reduces to the exact (x 2 -) distribution if p = 0. From a formula which 
yields all moments, it is concluded that, at least up to the degree of approxima¬ 
tion attained, p/T is an unbiased and consistent extimate of a 2 . 


2. Several writers have investigated the temporally homogeneous stochastic 
process defined by 


( 1 ) 


X, - = z< , t = 1, 2, • • • , T, I P I < l f 


where the z t are unobservable disturbances, normally and independently dis¬ 
tributed with mean zero and variance a 2 , the x t are observed variates, and the 
“first observation” x 0 has a normal distribution with mean zero and such a 
variance a 2 that all later observations have the same variance. Thus we have 

2 


( 2 ) 


o 

Ox = 


a 


1 - P 2 


and the joint distribution of a sample of T + 1 successive values is 


(3) 


, , _ d_- pJL 

9\X 0 , X \, • * • , Xt) ^tto’ 2 ) 2 '^ ^ 2 * 


_ - 2? ^ 


2p(xoXi + ■ • • + Xt-iXt) + (1 + p 2 )(zi + • • • + 




Koopmans ([1], formula 96), by smoothing characteristic values, has obtained 
an approximation to the distribution of the serial correlation coefficient r for the 
case p = 0, where 


(4) 


XqXi + • - * + Xt-iXt 
Xq + • • • + X\ 


1 Cowles Commission Papers, New Series, No. 21. 

80 



SERIAL CORRELATION COEFFICIENT 


81 


This result is expressed in the form of a definite integral whose evaluation 
has not so far been effected. 

By considering the related circular stochastic process, where Xq is defined to 
be the same observation as x T , great simplification is obtained. Here the 
joint distribution of x\ , x 2 , • • • , x T is 

/<*„ =P [- M1 ‘_ 7) 

(5) {(1 + p*)(xi H- + X%) - 2p(x 1 x i + • • • + x T Xi)} J 

1 ~ p T 

= (1 _ p 2 ) T /2 * 

By smoothing characteristic values, Koopmans ([1], formula 92) found a definite 
integral and Dixon ([2], 3.22) an explicit expression for an approximate distribu¬ 
tion of the circular serial correlation coefficient f, for the case p = 0, where 

(6) r= W+ ••+4" 

Dixon’s distribution R 0 (f) has the simple form 

r (| + 1) 

(7) fto(f) = - N 2 ? - (1 - f s ) rn -*. 

r Ci)r (2 + 2 ) 

Rubin [3] proved these results to be equivalent. On the other hand, R. L. 
Anderson [4] obtained the exact distribution of f in the case p = 0. Madow [5] 
extended this result to the case p ^ 0, using a property of sufficient statistics 
also noted by Koopmans ([1], p. 17) in connection with the non-circular problem. 

It would, however, be difficult to find percentile points or moments from 
Madow’s exact distribution. An approximate distribution of f for p ^ 0, 
together with its moments, analogous to Dixon-Koopmans* for p = 0, should 
therefore be of interest. The purpose of this paper is to obtain such a distribu¬ 
tion from the circular universe (5) . The statistic f is shown to be a consistent 
estimate of p within the limits imposed by the approximation. In addition, an 
approximate distribution for p = x\ + • • • + x T in the case p ^ 0 (which 
reduces to the exact chi-squared distribution when p = 0) is derived, together 
with all of its moments. 

3. We begin by asking about an approximate joint distribution of p and $ de¬ 
fined by 


( 8 ) 


p = xl + • • • + x T 

q = X1X2 + • • • “f* XtX 1. 



82 


R. B. LEIPNIK 


Defining <f>(u , v) as the expectation of exp [i(up + vq)], we have 

- 2 ^ - 2 + ia*vj $j J <&!••• (&r. 

On integration, we find 

(10) <t>(u, v) = A(p)[4(w, u)] H 

where A(u , v ) is the determinant of the matrix associated with the quadratic 
form within the curly brackets in (9). A{u , v) is a circulant; its value as deter¬ 
mined from the circulant formula ([2], p. 123) is 

(11) A(u, v) = n (y - 2z cos y) 

where y and z are defined by 

y = Ih p — 2ia 2 u 

y l - p 2 

( 12 ) 

z = — p - 
1 — P* 

To get an approximation A(u , u) to A(w, v) we smooth log A(w, v) by Koopmans* 
method. We have 


L3) log A{u, 

We define A(u, v ) through 


. o) = Z log (: 


?/ 2z cos - 


log Z(u, l>) = jf log ^2/ - 22 COS ^ (ft 


in which the summation in (13) is replaced by integration. The integral in (14) 
is easily evaluated ([6], p. 65) giving 




Incidentally, had we used q L = x\x L+ i + • • • + x t x t +l in place of qi = j in (9), 
we would have obtained the same expression (15) for A(u , v). 

Setting $(u, v) = \{p)[A(u y v)] _i we may determine X(p) by the requirement 
0(0,0) = 1. A simple calculation yields the result X(p) = (1 — p 2 )“ (r/,) . (Note 

that = 1 — p is close to 1 for large values of T ). Our result for $(u 9 v) 
X(p) 

appears as 


0(w, v) « X(p) I 


\/y 2 



SERIAL CORRELATION COEFFICIENT 


83 


The approximate joint distribution of p and q may be written as the double 
Fourier integral 

(17) 6 (p, q) = ~ I* exp l-i(up + »?)] dudv 

which we evaluate ([7], 576.3, 914.3) by changing integration variables from 
u, v to y, z and integrating out y and z successively. We obtain finally 


exp [-i(up + t>§)] 


fy + 


D(P, q) = \ • —- (1 - / / )r i T P- <T,!W (P J - ? 2 ) T,2 -‘ 


" xp [“ 2^(1- ,■) 10 + <-’>»- Ml] ■ 


Changing variables from p, q = pr to p, f , we obtain for F(p, f), the approximate 
joint distribution of p and f, the expression 


F(P, f) = 


r [2<r 2 (l - p 2 )]- (W2) 


r (» + l) 


(1 - r) T 


(1 + P ~ 2 pf) 


""L 2<r 2 (l - p 2 ) " " ’J- 

We could also have derived (19), following Madow, by noting that for p — 0, 
p and f are independently distributed, p having the chi-squared distribution and f 
having approximately the Dixon distribution (7), and that p and f are sufficient 
statistics for the estimation of p and o . 

4. The approximate marginal distribution R P (f) of f is obtained by an easy 
integration from (19) 

- />* »- 1 eMC: (i - 

1 *«»»(*+9 


5 P (f) = 


* 6+0 


2c 2 (1 - p 2 ) 


{1 + p 2 — 2pf} 


(1 - r) \l + P 1 - 2pf)~ 


Our notation is consistent since indeed reduces to the Dixon distribution 
for p = 0. R P (?) has a maximum when 


f = W = 2 ^— 2) I (1 + p)(T ~ 1)_ Vr(T - 2)(1 - P 2 ) 2 + (1 + P>)*}. 



84 


R. B. LEIPNIK 


A little manipulation shows that 1 > | f max | > | p | and that = p asymp¬ 
totically. A graph (Fig. 1) of 1 S p (f) for T = 20, p = 0, .2, .5, .7, .9 is appended 
from which it is seen that for | p | near 1, the distribution becomes highly con¬ 
centrated about f mftT . On differentiating R P {f) with respect to p and eliminating, 
the envelope of the R P (f) is seen to be 



-10 -9-8-7-6-5-4-3-2-10 I 2 3 4 5 6 7 8 9 0 


P 

Fifc. 1. Graph of the Distribution of the Serial Correlation Coefficient in a Circular 

Universe, for T =* 20 


6. Before evaluating the moments of R p (f) we will pause to obtain the ap¬ 
proximate marginal distribution P p (p ) of p, and its moments. We write 


P,(p) = F(P< r) df = ^ • 


(2a 2 (l - p 2 ))- 


' JL, ^2 /r 1 \ 


If we define 7,(z), the Bessel function of order v and purely imaginary 
argument by 


SERIAL CORRELATION COEFFICIENT 


85 


( 22 ) 


Uz) = E 


n—0 7lir(l> -J- U -j“ 1) * 

we obtain ([ 8 ], p. 79), if p 5 ^ 0 

w P M = ;v =p [- $ (£$)] *. 0^) . 


and if p = 0 
(24) 


Po(p) = 


( -^p-‘ex P r-P-l 

F (f) L 2(t*J ’ 


on performing the integration indicated in ( 21 ). Po(p) coincides with the 
exact distribution Po(p). An expression covering all moments of P P (p) is 
obtained from (16) by setting v = 0, differentiating, and setting u = 0 . We have 


(25) 

hence 


0(w, 0) = X(p) 


( » + V' - I r^j) 


-772 


Wl = r *l** ( “’ 0) 


(26) 


= (-arVa - p' 2 r m 


d/\. 2 / 


v-iu-p’I/d-p*) 


From (26), we readily find 


(27) 

(28) 


E\p\ - TV 2 , £ [|] - <x 2 
£[pj = (TV) 2 + 27V (J-^) 
»*„ = 22V , V/r 



Thus the unbiased character of p/T as an estimate of c 2 is reflected in the ap¬ 
proximate distribution, while (28), which shows that lim a\/ T = 0 , indicates 
that consistency is also reflected. 


6. We now calculate the moments of R p (f ). Interchanging the order of integra¬ 
tion in the expression for £[r k ) is justified by the uniform convergence, so we have 




86 


B. B. LEIPNIK 


S[f*] = f* £jf Pip, f] dpj df « jf ££ f* P(p, f) dfj dp 


T r2<r e (l - P 2 )]-« 2 r 

■'-©W' 


-[-*(r^]{r 


f*(l - f 2 ) 772 - 1 ' 1 exp (mf) df > dp 


where m is defined by 


<r 2 ( 1 — p 2 )* 


Defining (7(m) by 


we have ([8], p. 79) 


G(m) = J (1 - f) Tn ~ m exp (mf) df 

rr , M' r;! hiM. 


Differentiating each side of (32) A; times, we find by (31) and (32) 

jfc z»+i 

O I ^2\T/2— 1/2 _ / _ -\ J- 


, G(m) = Jf f*(l - f J ) r/2 - 1/2 exp (mf) 




d fc r_ —Ti2 


[nr T,2 I T/ 2 (m)]. 


Using the identity ([8], p. 79) 


£ [*"’/,(*)] = z-'I^iz) 
az 

and changing the integration variable in (29) from p to m, we obtain finally 
(34) £[f*] = | p _m jf m T,2_l exp P t^WWI dm. 


For A; = 1, we have ([8], p. 386) 


m = 


2 * 
i + l 


For fc = 2, after some tedious calculation, we find 

*V,_i_ . p a r(r +1) 

11 r + 2 T (T + 2)(T + 4) 

-* = _L_ Ti - p*r(r-2) 1 

vr T + 2 L (T + 2)(T + 4) J 


( 36 ) 



SERIAL CORRELATION COEFFICIENT 


87 


We note that lim S(f) = p and lim <rf = 0, so that at least to the extent of 

T-*oo T—*» 

approximation furnished by # p (f), f is a consistent estimate of p. 

The author wishes to express his gratitude to Dr. T. Koopmans, under whose 
kind direction this paper was written. 

REFERENCES 

[1] T. Koopmans, “ Serial correlation and quadratic forms in normal variables”, Annals of 

Math. Stat., Vol. 13 (1942), pp. 14-33. 

[2] W. J. Dixon, “Further contributions to the problem of serial correlation”, Annals of 

Math. Stat., Vol. 15 (1944), pp. 119-144. 

[3] H. Rubin, “On the distribution of the serial correlation coefficient”, Annals of Math. 

Stat., Vol. 16 (1945), pp. 211-215. 

[4] R. L. Anderson, “Distribution of the serial correlation coefficient”, Annals of Math. 

Stat., Vol. 13 (1942), pp. 1-13. 

[5] W. G. Madow, “Note on,the distribution of the serial correlation coefficient”, Annals of 

Math. Stat., Vol. 16 (1945), pp. 308-310. 

[6] B. O. Peirce, A Short Table of Integrals, Ginn and Co., 1929. 

17] G. A. Campbell and R. M. Foster, Fourier Integrals for Practical Applications, Bell 
Tel. Tech. Pub , 1942. 

[8] G. N. Watson, A Treatise on the Theory of Bessel Functions, Second Rev. cd., Cambridge 
University Press, 1944. 



CONCERNING THE EFFECT OF INTRACLASS CORRELATION ON 
CERTAIN SIGNIFICANCE TESTS 

By John E. Walsh 
Princeton University 

1. Summary. In practical applications it is frequently assumed that the 
values obtained by a sampling process are independently drawn from the same 
normal population. Then confidence intervals and significance tests which were 
derived under the assumption of independence are applied using these values. 
Often the assumption of independence between the values may be at best only 
approximately valid. For some cases, however, it may be permissible to assume 
that the correlation between each two values is the same (intraclass correlation). 
The purpose of this paper is to investigate the effect of this intraclass correlation 
on the confidence coefficients and significance levels of several well known 
confidence intervals and significance tests which were derived under the assump¬ 
tion of independence, and to extend these considerations to the case of two 
sets of values. 

In the first part of the paper the relations given in Table I are used to compute 
tables which show the effect of intraclass correlation on the confidence coefficients 
and significance levels of the confidence intervals and significance tests listed in 
Table II. The second part of the paper consists of the proofs of the relations 
given in Table I. 

2. Introduction. Let the n values Xi , ... , x n represent a single value of a 
normal multivariate population for which each of the n variables has mean p 
variance <r 2 , and the correlation between each two variables is p. These n 
values will be called a correlated “sample.” The values x\ , ••• ,x n and 
2/i, • • • , y m are said to represent two correlated “samples” if they have a normal 
multivariate distribution such that the x’s have mean p, variance <r 2 , correlation 
p, the y’s have mean p', variance <r' 2 , correlation p', and the correlation between 
each x and y is p". This paper shows that several well known quantities which 
have Student t , or Snedecor F distributions when the values form random 
samples still have these same distributions for correlated “samples” if the quanti¬ 
ties are multiplied by suitable constant factors, where it is to be remembered 
that for normal populations a correlated “sample” is a random sample if and 
only if p = 0 and that two correlated “samples” represent two random samples 
if and only if p = p' = p" = 0. The quantities considered and the corresponding 

n m 

factors are listed in Table I, where x = T', Xi/n and y = y a /m. Several com- 

i i 

monly used confidence intervals and significance tests based on these quantities 
and derived under the assumption of randomness are considered, and tables are 
computed which show how the confidence coefficients and significance levels of 



EFFECT OF INTRACLASS CORRELATION 


89 


these confidence intervals and significance tests vary if the values are from 
correlated “samples” instead of random samples. Table II contains an outline 
of the confidence intervals and significance tests considered. It is found that 
these confidence coefficients and significance levels can change noticeably when a 
correlated “sample” is considered. This is particularly true for the Student 
£-test. For example, in one case it is found that if the sample size is 32 and the 
significance level is .05 when p = 0, then the significance level becomes .23 for 
p = .05. This large change in significance level for a small change in p is ex¬ 
plained by the factor given for the Student ^-distribution in Table I. This 
shows that test results which appear to be “significant” under the assumption of 
randomness are not necessarily “significant” when correlation is present, even 
though the amount of correlation may be small. The effect of correlation on the 


TABLE I 


Quantity 

Distribution For 

Random Sample 

Factor Multiplying 
Statistic for 
Correlated “Samples” 

(X — n) yjn(n — 1) (£ — p) V n(n — 1) 

Student t-distribution 

g n -i(t ) dt 

a f~ 1 ~ p 

y 1 + U - 1)p 

5 (*-*)■ 


x 2 -distribution 

dx 1 

1 

1 — P 

<r' 2 £ 2 “ ■ ? ) S 

a 2 S' 2 « 

(y<* - y )* 

i 

Snedecor F-distri- 
bution 

Jln-l.m-lCF) dF 

1 ~ P' 

1 — P 


X and Snedecor F tests is not as great as for the Student £-test as can be seen from 
the factors given for the \ and Snedecor F distributions in Table I. 

3. Effect of intraclass correlation. The relations stated in Table I will now 
be used to investigate the effect of intraclass correlation on the confidence co¬ 
efficients and significance levels of several common types of confidence intervals 
and significance tests which were derived under the assumption of random 
samples. The confidence intervals and significance tests considered are listed 
in Table II, where S 2 and S ' 2 are defined in Table I. These particular confidence 
intervals and significance tests have the property that if a is the confidence 
coefficient of the confidence interval listed for a given statistic, then 1 — a is 
the significance level of the significance test listed for that statistic, this relation 
holding whether random samples or correlated “samples” are considered. For 
this reason the tables given in this section will be limited to confidence coeffi- 



90 


JOHN E. WALSH 


cients; the corresponding significance levels can be obtained by using the above 
relation. 

a. Student t-distribution. If a random sample of size n is drawn from a normal 
population with mean p and variance <r 2 (denoted by N(p, a 2 )), a confidence 
interval for p with confidence coefficient t is given in Table II. If the n values 
form a correlated “sample”, however, it follows from Table I that the cor¬ 
responding confidence interval with coefficient € is 


x t,S j/rc(n^ ( l)(l - p) - M - * + t,S ]/niri^- 


(n - 1 )p 


1)(1 - P)' 


TABLE II 


Stat¬ 

istic 

Para¬ 

meter 

Exam¬ 

ined 

Confidence Interval 
(Confidence Coefficient «) 

Significance Test 
(Significance Level — 1 — t) 

Definitions of 

Constants 

t 

P 

s,+ Jl s . _ 

\/w(n — 1) 

1 i ~ a*I 

^ i.S/vVn - 1) 

f dt = , 

J-t, 

X* 

tr* 

S*/x5 

S l/x*« 

<r 

f s /_.(x*) d x > = e 

J X« 

F 

a 2 

<r' 2 

O ^ ^ S*/S'*F t 


f (f) <1F = . 

J F, 


The confidence interval given in Table II can be rewritten as 


1)(1 - p) 


= M = S + <, 




n(n - 1)(1 - p)’ 


where 



_ 1 ~ P 

I + (n — 1 )p 


Hence if p < 0, ot > e and the confidence coefficient of the confidence interval 
in Table II is greater than e. This means that the significance level of the 
corresponding significance test listed in Table II would be less than 1 — e so 
that any test result which would be significant for a random sample would also 
be significant for a correlated “sample” for which p < 0. If p > 0, however, 
€ > a and the significance level of the test would be greater than 1 — €. Thus a 
test result which would be significant for a random sample need no longer be 
when p > 0. The effect of positive values of p upon the confidence coefficient 
a = a<(p, n) of the confidence interval of Table II is given in Table III for the 
cases e = .95 and .99. Confidence intervals with unequal tails can be treated 



EFFECT OF INTRACLASS CORRELATION 


91 


in a similar manner. It is thus seen that the effect of correlation on the con¬ 
fidence coefficient increases with the sample size n, and that even a very small 
amount of correlation can cause a large change in a . For example, for samples 
of size 16 a correlation of p = .05 will change the significance level from .05 to 
.135; for samples of size 32 a correlation of p = .05 will change the significance 
level from .01 to .102, and from .05 to .23. 

Confidence intervals for p — p' are given by Theorem 5 of section 4. It is to be 
observed that if p = p' = p" and <r = cr' the confidence coefficients are inde¬ 
pendent of p and a. If m = n, p = p', <r = <r', p" = 0, however, the confidence 
coefficients of the confidence intervals for p — p' have the values a = a*(p, n) 
given in Table III. 


TABLE III 
Values of a<(p, n ) 


p 

0 

.05 

.1 

.2 

.3 

.4 

.5 

n 









.99 


.983 

.974 

.961 

.944 

.920 

4 

.95 


.921 



.805 

.744 

8 

.99 


.959 


.853 

.790 


.95 


.865 


.620 



16 

.99 





.600 

.515 

.95 

.865 

.74 


.54 



32 

.99 


.79 

.63 




.95 


.68 





64 

.99 

.79 






128 

.99 

.68 







6. x-distribution. If a random sample of size n is drawn from iV(p, a 2 ), a con¬ 
fidence interval for <r 2 with coefficient e is given in Table II. If the n values form 
a correlated “sample”, it follows from Table I that the corresponding con¬ 
fidence interval with coefficient e is 

0 £ <r 2 £ S 2 /xl(l - p). 

The confidence interval in Table II can be rewritten as 

0 ^ cr 2 ^ S7x 2 «(l ~ P), 

where 

Xa = X./U - P)- 

Hence if p < 0, a > e and the significance level of the significance test given in 
Table II is less than 1 — e. If p > 0, the significance level of the test is greater 















92 


JOHN E. WALSH 


than l — «. The effect of positive values of p upon the confidence coefficient 
a = a x *(p, n) of the confidence interval listed in Table II is given in Table IV 
for € = .95 and .99. Cases in which the lower limit of the confidence interval 
is not zero can be treated in a similar manner. Table IV shows that the con¬ 
fidence coefficient a = a x *(p, n) decreases with the sample size n for a fixed value 
of p. Although the effect of correlation for the x 2 "distribution is not as great as 
for the Student ^-distribution, it does cause a noticeable change in a. For 
example, for samples of size 16 the significance level of the test in Table II is 
changed from .05 to .081 if p = .1 and from .05 to .13 if p = .2. For samples of 
size 32 the significance level is changed from .05 to .10 for p = .1 and from .05 to 
.19 for p = .2. 

c. Sriedecor /- distribution . If two random samples, one of size n (denoted 
by x’s) and the other of size m (denoted by y’ s), are drawn from V(p, a 2 ) 
and V(p', <r' 2 ) respective!}'', a confidence interval for a 2 /V 2 with coefficient « 


TABLE IV 
Values of a x 2 (p> n ) 


p 

0 

.1 

! , 

3 

4 

5. 

n 



1 




A 

.99 

.988 

.986 

.983 

.979 

.971 


.95 

.941 

.930 

.918 

.900 

.872 

IP* 

.99 

.982 

.966 

.941 

.890 

.790 

JU 

.95 

.919 

.87 

.79 

.67 

.49 

32 

.99 

.975 

.946 

.867 

.715 

.44 


.95 

.90 

.81 

.64 

.38 

.17 


is given in Table II. If the values form two correlated “samples”, however, 
it follows from Table I that the corresponding confidence interval with coeffici¬ 
ent e is 

n ^ _2/_/2 ^ aS 2 (1 — p') / ^ 

The confidence interval in Table II can be restated as 


n ^ s / /* ^ <S J (1 — p') j „ 

°- <r/<T = mr—p) / Fa ’ 


where 


■F a = F,(l - p')/( 1 - P ). 

Thus if p = p', a = e and the significance level of the significance test given in 
Table II remains equal to 1 - If (1 — p')/(l — p) < 1, a > t and the 
significance level is less than 1 - t. If (1 — p')/(l — p) > 1, however, a < t 



EFFECT OF INTRACLASS CORRELATION 


93 


and the significance level is greater than 1 — Values of the confidence 

coefficient a = a F Q -- ? ? Uj 0 f the confidence interval listed in Table II are 

given in Table V for € = .95 and .99. Cases in which the lower limit of the 
confidence interval is not zero can be treated in a manner similar to that given 
above. Table V indicates that the effect of correlation on the confidence 
coefficient is not as great for n < m as for n > m. For example, if n = 4, m = 32, 


TABLE V 



--- = 1.25, the significance level of the significance test given in Table II is 

1 ~ P 

only changed from .05 to .009, if i- - = 1.5 from .05 to .087. If n = 32, m = 4, 

1 — p 

i 

-- = 1.25, however, the significance level is changed from .05 to .094, if 

1 -9 

* __ t / 

- - = 1.5 from .05 to .142. Also it is seen that for fixed -- - , the effect of 

1 — p 1 — P 

intraclass correlation increases with both n and m. 










94 


JOHN E. WALSH 


4. Analysis. This section contains derivations of the relations stated in the 
first three sections. The method used in these derivations is similar to that used 
in one approach to the analysis of variance and consists essentially in expressing 
each variable as the sum of two quantities, one of which is the same for each 
variable and the other of which is different for each variable. 

Let Xi , • • • , x n represent a correlated “sample”, that is, have a normal 
multivariate distribution for which 


E(xd = m , (t = 1 , • • • , n) 

(0 E[(x< - m ) 2 ] = a 

E[(xi — n)(xj — aO] = pa, (i 9* j = 1, • • • ,n). 
Write the x, , (i = 1, • • • , n), in the form 


“ V + M + f »f 

n 

where J = X) £*/n and tj, £i ,*•*,£» are independently distributed, rj according to 

N(p, a\) and the £< according to N(fd, a\). The values of X, a\ and a\ are chosen 
so that the x, = rj + A£ + {»• satisfy (1). It is easily proved that it is always 
possible to choose X, a\ and a\ so that (1) are satisfied. It is to be remembered 
that p ^ — l/(n — 1) for intraclass correlation. From relations (1) and 
Xi = 17 + X£ + £i it follows that 

(2) E(& = <Al -p), (t = 1 


Theorem 1. The quantity — ( 


Y, (xi — x ) 2 has a x -distribution with 


Ai — p) 1 

n — 1 degrees of freedom and is distributed independently of x. 

Proof. Since the £, are independently distributed according to the same 
normal distribution with zero mean, it follows from (2) that 


JL 

B<&) 


E (i. - If = 


Ai - 9 ) A r 


E (*< - *) 2 


has a x 2 -distribution with n — 1 degrees of freedom and is distributed inde¬ 
pendently of x = 17 + (1 + X)|. 

Theo.^2. 


Vl + (n — l)p / 
tribution with n — 1 degrees of freedom . 

Proof. It is easily seen from elementary considerations that - 


_ has a Student t-dis- 

— p 


(x - n)s/n 


ay /1 + (n - 1 )p 

has the distribution N( 0, 1). Theorem 2 is then an immediate consequence of 
Theorem 1. 

Up to this point a single correlated “sample” of size n has been considered. 
The next part of the analysis, however, will be concerned with properties which 
arise from the consideration of two correlated “samples.” 



EFFECT OF INTRACLASS CORRELATION 


95 


Let xi, • • ■ , x„, j/i, •• • ,y m have a joint normal multivariate distribution 
such that 


(3) 


E(xi) = n, 

E(y „) = 

E[(x t - m ) ! ] = a 2 
E[(ya - m') 2 ] = <r' 2 

E[(xi — n)(xj — m)] = P<r 2 , 

E[(y a - m')( 2 ip - i*0] = p'A 

E[(xi - p)(y a - „*)] = p'W. 


(i = 1, ••• ,n) 
(a = 1, • • • , to) 


(i ^ j - 1, • • • , n) 
(a ^ 0 = 1, • ■ • , to) 


Write the Xi and y a in the form 


Xi = y + Xj| + Xsl' + $i 

( 4 ) 

2/a = V r + ^lf + + fl , 

m 

where V — 23 fa/m and rj, i?', fi, • • • , f n , f i , • • • , fm are independently 

i 

distributed, 17 according to 7 V(p, <r 2 ), 77' according to N(m', o^ 2 ), the f* according to 
AT(0, <r|), and the fl according to iV(0, <r( 2 ). The quantities Xi, X 2 , X( , , 

<r 2 , cr^ 2 , <r$ , <r$ 2 are chosen so that the and y a satisfy ( 3 ). It is easily verified 
that it is always possible to choose these quantities so that the Xi and y a con¬ 
structed in this fashion satisfy ( 3 ). In addition it follows from ( 3 ) and ( 4 ) that 


E(£) = <r 2 (l - p) 

(5) B(£) = er' 2 (l - pO. 

1 n l m 

Theorem 3 . 17;-: 23 (*< — x ) 2 and -727;-7 23 (2/a “ V ) 2 have x - 

a U — p) 1 a U ~ P / 1 

distributions with n — 1 and m — 1 degrees of freedom respectively , and arc dzs- 
tributed independently of each other and of x and y. 

1 w 

Proof. From Theorem 1 and ( 5 ) it follows that -27;-: 23 ( x < #)* 

0 U “ P) 1 

1 m 

and -727;-7 23 (2/« — 27) 2 have x 2 -distributions with n — 1 and m — 1 degrees 

o’ U — P / 1 

of freedom respectively. That they are distributed independently of each other 
and of both x and § follows from ( 4 ). 

<t' 2 (1 — p') 23 (*< — £) 2 

Theorem 4. -^- m distributed according to the Snedecor 

o 2 (l — p) 23 (2/a ~ 2/) 2 
1 

F-distribution X„_i, m -i(F)dF. 

Proof. This follows from Theorem 3 . 



96 


JOHN E. WALSH 


Theorem 5. 

[(x - y) - (m - n')Wn + m - 2 / /]£ (x< - x) 2 + ]C (y« - g) 2 

<ri / ' ~a 2 (l ~~pT 7*(1 - p') 

where 

a 2 = — [1 + (n — l)p] H-[1 + (m — l)p'] — 2p'W / , 

n m 

Aas a Student t-distribution with n + m — 2 degrees of freedom. 

Proof. It is easily seen from elementary considerations that - l(x — y) — 

(p — /*')] has the distribution N ( 0 ,1). Theorem 5 then follows from Theorem 3 . 

The author wishes to express his appreciation to Professor John W. Tukey for 
valuable assistance and advice in the preparation of this paper. 



ON FAMILIES OF ADMISSIBLE TESTS 

By E. L. Lehmann 
University of California , Berkeley 

1. Summary. For each hypothesis H of a certain class of simple hypotheses, a 
family F of tests is determined such that 

(a) given any test w of II there exists a test w' belonging to F which has power 
uniformly greater than or equal to that of w. 

(b) no member of F has power uniformly greater than or equal to that of any 
other member of F. 

The effect on F of various assumptions about the set of alternatives are con¬ 
sidered. As an application an optimum property of the known type Ai tests is 
proved, and a result is obtained concerning the most stringent tests of the 
hypotheses considered. 

2. Introduction. In the theory of testing simple hypotheses, if a uniformly 
most powerful test exists, it is the most desirable test to use. If, as is generally 
the case, such a test does not exist, the choice between tests none of which is 
“altogether better” than all the others, has to be based on information not con¬ 
tained in the general formulation of the testing problem. If no such additional 
information is available, the choice must of necessity be somewhat arbitrary. 

Now although a single uniformly most powerful test exists only in exceptional 
cases, there will always exist a family F of tests such that 

(a) given any test w of the hypotheses H under consideration and of prescribed 
level of significance, there exists a test w' belonging to F which has power 
uniformly greater than or equal to that of w. 

(b) iff) member of F has power uniformly greater than or equal to that of any 
other member of F. 

The family F is essentially unique. Arbitrariness occurs only since a test region 
is not uniquely determined by its power function. But since two tests with the 
same power function are equivalent for testing purposes, it is from the present 
point of view immaterial which one is included in F. 

With the same restriction F is essentially the family of admissible tests, a 
test w being admissible if there is no test of the same level of significance which 
has power uniformly greater than or equal to but not identically equal to that of 
w. This definition differs only*trivially from the one given by Wald [1, p. 15] 
who defines a test w to be non-admissible if there exists a w ' with power every¬ 
where greater than that of w (except at the hypothetical point). 

F naturally depends on the class of alternatives considered. A restriction 
in the class of alternatives may (although it will not necessarily) diminish F. 
The family F may also be decreased by other additional information: For 
instance a probability distribution may be assumed for the set of alternatives, 
and some properties of this distribution may be presupposed. 

97 



98 


E. L. LEHMANN 


The determination of the family F, (and a description of the power functions 
of the tests in F) might be considered a solution of the testing problem. The 
solution is not unique and hence does not provide a basis for action. This 
reflects the fact that additional information is needed to make possible the 
unique choice of a best test. On the basis of the available information, F repre¬ 
sents the furthest reduction of the problem that seems possible. On the one 
hand, if the choice of test is to be made from the point of view of power, the only 
contestants for “best test” are the members of F. On the other hand, the 
available information does not give preference to any one member of F over any 
other unless additional principles (such as unbiasedness for instance) are 
introduced. 

It is the purpose of the present paper to illustrate the above notions by deter¬ 
mining F for a very simple case. 


3 . Determination of the family F. Let the random variable 
E = {X l 9 Xt 9 — iXJ 
have a probability density function 
( 1 ) V 

depending on parameter 0. Concerning (1) we shall make the assumptions 
under which Neyman [2, 3 ] has shown the existence of the type Ai test of the 
hypothesis 


H i Q — 6 q , 


Assumptions: 

(a) Conditions of regularity: 
The integral 


/ Pe(e) . 

* XJD 


e — ( X\ , • • •, x n ) 
de — dx\ • • • dXfi 


extended over any region w in the sample space, admits of two successive deriva¬ 
tives with respect to 0 under the integral sign, i.e. 

(4) L Me) de = L w Me) de for kz=h2 - 

(6) A differential equation: 

If 


<fie(e) = Q- g log Me) 
= jg'PM, 



FAMILIES OF TESTS 


99 


<P$ Q is not identically zero, and there exist functions of d (but independent of e), 
A and B , such that 

( 6 ) <pe = A + B<p $. 

Under these assumptions Neyman has shown 

A. that the probability density function p 9 is of the form 

( 7 ) p 9 (e) = exp + T(e)-Q( 6 ) + «(«)} 

where Q is a monotone function with Q( 0 ) | Mo j* 0 (without loss of generality 

ad 

we shall assume Q monotonely increasing) and 

B. that the type Ai test of the hypothesis H exists, and is given by 

(8) T(e) < a , T(e) > c* 
for suitable choice of ci and C2. 

In what follows we shall assume that the permissible first kind error in testing 
II is fixed throughout and has the value e. By a test w of H we shall always 
mean a test of level of significance e, i.e. satisfying 

( 9 ) f p> 0 (e) de = «. 

•'to 

Let us consider the family of tests 

( 10 ) w(k): T(e) < k, T(e) > f(k) ; k < f(k) 


where f(k) is determined by ( 9 ). It easily follows from ( 9 ) that k can take on 
all values from — to k 0 , say, where k 0 is such that 

(11) /(fco) = + 00 • 


For the family F of tests { w(k )}, — oo < k < k 0 we now state 
Theorem 1 . All members of F are admissible , and if w is any admissible test 
not in F, there exists a member of F which has power identical with that of w. 

We first prove the 

Lemma. Let ($* denote the power}unction of a test w. Then if ki < k2 

^to(fcx) ($) ^ ^to(fra) ($) if 6 < do 

( 12 ) 

fiw(ki) ( 0 ) ^ Pio(k 2 ) ( 0 ) if 6 > do. 


Proof: Let w denote the complement of a region w. Consider the intervals 


( 13 ) 


I = w(h ) • w(k 2 ) 
J = w{ki) • w(k 2 ). 


I lies entirely to the right of J. Let d > d Q . Then 



100 


E. L. LEHMANN 


(14) 


vM 

Pe,(e) 


= C(6) exp \ T(e)[Q(d) - Q(0o)]} 


is a strictly increasing function of T since Q is increasing. Therefore there exists a 


constant 

C such that 




(15) 

”•<;> < c 

P>„(e) ~ 

if 

Tie) 

is in J 

Vt »(e) 

if 

T(e) 

is in 7. 

Since 





(16) 

/ p« 0 (e) de 

= 

/ 

P*.(e) de 





we have 





(17) 

[ p,„(e) de 


( 

P* 0 (e) de 




* T(0) eJ 

and therefore 




(18) 

£ p,(e) de < C ■ J Pi t (e) de 

= 


r Pi 0 (e) de 

from which it follows that 




(19) 

[ PM de 

< 

/ 

p*(e) de 





which is 

the desired result. 





Proof of Theorem 1 . The proof consists of several parts. 

I. Let m be any real number, and assume that there exists a value of k such 
that 


( 20 ) 

( 21 ) 


d 

dd 


/3«(0o) — € 

A*(0) 10-0o = m 


for w = w(k ). Then w(k) has power uniformly greater than or equal to that of 
any other test satisfying (20) and (21). 

For m — 0 this becomes Neyman’s theorem stating that the type A test is 
also of type A \. The proof of the theorem however is independent of the value of 

(23) - 

and hence carries over to arbitrary m. 

II. If there exists any test satisfying (20) and (21) then there exists a number 
k for which w(k) also satisfies (20) and (21). 



FAMILIES OF TESTS 


101 


To prove this let us determine, of all tests satisfying (20), the one which 
maximizes 

(24) £ A.W Iw. = £ ~ P,(e) |«. *. 

This can be done by means of the lemma of Neyman and Pearson [4, p. 11] 
which gives sufficient conditions for a region w , subject to restrictions 

(25) ‘ de = o,-, (t = 1, • • •, p), 

to maximize an integral 

(26) f g(e ) de. 

j VO 

According to this lemma the desired test is of the form 

(25) ^P«(c) !(_*„ > o-p»«(c) 

provided a value of a exists for which this test satisfies (20). (25) is equivalent to 

(26) P'(0 0 ) + T(e) • Q'(Q 0 ) > a from (7) 

or, since Q'{0 0 ) > 0, to 

(27) T(e) > b. 

Thus, if a number b exists such that the test (27) satisfies (20), this test is the one 
maximizing (24). But such a number does exist, namely /(— cc). Therefore 
— oo) is the desired test. 

Similarly it is easy to show that of all tests satisfying (20), w(k 0 ) minimizes (24). 
But 

(28) -Tj Pvo{k)(0) = f ~ vM 1 9 - 6 0 dc 

du J r<A,r^/(*) ou 

is a continuous function of k , and therefore takes on all intermediate values, 
which establishes II. 

III. From I. and II. we conclude that given any test w there exists a member 
of F which has power uniformly greater than or equal to that of w. For let w 
be any test of H . From the condition of regularity it follows that its power- 
function has a derivative at d 0 . By II. there exists a value of k such that the 
powerfunction of w{k) has the same slope at 6 0 , and from I. it follows that 
w(k) is uniformly more powerful than w. 

But from the lemma we see that none of the tests w(h) is uniformly more 
powerful than any other. Hence all members of F are admissible, and the 
theorem is proved. 

From the lemma and Theorem 1 we can conclude for all members of F the 
following optimum property: 



102 


E. L. LEHMANN 


Corollary 1 : Let w be any test y and let w 0 be any member of F. Then at least 
one of the two statements 


&w(0) < Pw 0 (6) for all $ < 0 O 

(29) 

fiw(d') ^ /3 Wo (0) for all 0 ^ do 

must hold . 

The lemma and Theorem 1 also give the following result concerning most 
stringent tests, defined by Wald [ 1 , p. 33]. 

Corollary 2: There exists a uniformly most powerful of all most stringent tests. 
It is that unique member wo of F for which 


l.u.b. [l.u.b. p w (0) 
0<0o L 


- ft»o( 0 )] = l;U.b. [l.u.b. p w (e) - /3„ 0 (e)J. 


4. The effect on F of assumptions about the alternatives. Let us next consider 
how a restriction in the set of alternatives effects the family F. From the lemma 
it follows that there is no change as long as the set of alternatives contains 
values of 0 both greater and less than d 0 . On the other hand, if the alternatives 
are restricted to values of 6 greater than 0 O , say, the family F for testing H 
against these alternatives consists of only a single member, the test w(— oo), 
(and similarly for the other onesided case). This follows from 
Theorem 2 : Under conditions a. and b. the test w(— oo) is uniformly most 
powerful against the alternatives 0 > do y the test w(ko) is uniformly most powerful 
against the alternatives 0 < 0 O . 

Proof: Let w be any test. By Theorem 1 there exists a number k such that 
(30) p v (6) < p w (k)(0) for all 6. 

From the lemma it follows that 

fiw(k){d) Pw{k Q )(d) if 0 < 00 

fiw(k)(d) ^ oo)( 0 ) if 0 > 0 o» 

Combining (30) and (31) we have the desired result. 

(It is also easy to prove Theorem 2 directly from the Neyman-Pearson lemma.) 
In order to illustrate how the assumption of an a priori distribution of 0 
together with some information about this distribution affects F } let us consider a 
special case of the class of hypotheses discussed so far. 

Let 

(32) p 9 (x = c • < r * S( “~* )8 

so that E = (Xi, X 2 , - • • , X n ) is a sample from a normal distribution with unit 
variance and unknown mear, We want to test the hypothesis 

(33) : 0 = 0. 

We shall show that if 0 has a probability density function g which is symmetric 



FAMILIES OF TESTS 


103 


about the origin, then the family F for testing H consists, as might be expected, 
of a single member, the type A x test. 

Our problem is to find the test w satisfying 

(34) / p Q (xi , • • •, x„) dxi • • • dx n = € 

"to 

and which maximizes 

(35) f g(6) f p f (x u •••,*„) dx L ■■■ dx n dd. 

j— 00 J w 

Inverting the order of integration, which is permissible in this case, the Ney- 
man-Pearson lemma shows the desired test to be of the form 

(36) f g(e)p e (x l , • • • , x n ) d$ > a • p 0 (xi , • • • , x n ) 

provided a value of a exists for which (36) satisfies (34). Substituting from 
(32), (36) becomes 

(37) f(x) = [ g(0)e- i,,+n,i d0 > a 
where 

(38) x = - 2Z x i • 

n t-i 

Since 

(39) £ 2 f(x) > 0 

the region (37) is cither empty, which would contradict (34), or else can be 
described by inequalities 

(40) x < ai , x > a> 
where 

(41) f(a i) = /(a>) 

the latter equation becoming, on substitution from (37) 

(42) [” g(0)e M2)9 \ c na ' 9 - e nai9 ) dd = 0. 

J—oo 

If g is an even function, (42) is certainly satisfied when ai = — Oa. Our test 
then becomes 

(43) x < x>(h 

which for proper choice of a* satisfies (34) and is the well known type A\ test. 



104 


E. L. LEHMANN 


5. Concluding remarks. Let us consider once more a probability density 
function satisfying a. and b. We have seen that the family F for testing H 
against the alternatives Q 0 O contains an infinity of elements unless we make 
some additional assumptions. On the other hand, if the principle of unbiased¬ 
ness is accepted, F shrinks to a single element: the type A\ test. 

But unbiasedness does not insure power. Thus conceivably some other test 
might be more powerful than the test chosen, everywhere except in a small one 
sided neighbourhood of 0 O . That this is not so is shown by Corollary 1 to 
Theorem 1. This remark illustrates how intuitively appealing principles and a 
knowledge of the family F may be used in conjunction to arrive at a choice of 
a satisfactory test, when not enough information is available to make the choice 
compelling. 

Finally, it should be pointed out that although we restricted our considerations 
to simple hypotheses, the notions developed also apply to composite hypotheses. 

REFERENCES 

ll] A. Wald, “On the principles of statistical inference”. Notre Dame Mathematical 
Lectures. Number 1. 

[2] J. Neyman, “L’estimation statistique traitde comme un probldme classique de prob¬ 

ability”. Conferences inter Rationales des sciences mathkmatiques d Genbve: 
Colloque d’Octobre 1987, sur le Calcul des Probability, Paris, 1938. 

[3] J. Neyman and E. S. Pearson, ‘‘Contributions to the theory of testing statistical 

hypotheses, Part II”. Statistical Research Memoirs, Vol. II, London, 1938. 

[4] J. Neyman and E. S. Pearson, ‘‘Contributions to the theory of testing statistical 

hypotheses, Part I”. Statistical Research Memoirs , Vol. I, London, 1936. 



CONDITIONAL EXPECTATION AND UNBIASED SEQUENTIAL 
ESTIMATION 1 

By David Blackwell 
Howard University 

1. Summary. It is shown that E[f(x) E(y | x)] = E(fy) whenever E(fy) 
is finite, and that oE(y | x) < <j 2 y , where E(y | x) denotes the conditional ex¬ 
pectation of *y with respect to x . These results imply that whenever there is a 
sufficient statistic u and an unbiased estimate t , not a function of u only, for a 
parameter 6 , the function E(t | u ), which is a function of u only, is an unbiased 
estimate for 6 with a variance smaller than that of t. A sequential unbiased 
estimate for a parameter is obtained, such that when the sequential test termi¬ 
nates after i observations, the estimate is a function of a sufficient statistic for the 
parameter with respect to these observations. A special case of this estimate is 
that obtained by Girshick, Hosteller, and Savage [4] for the parameter of a 
binomial distribution. 

2. Conditional expectation. Denote by x any (not necessarily numerical) 
chance variable and by y any numerical chance variable for which E(y) is finite. 
There exists a function of x, the conditional expectation of y with respect to x 
[3, pp. 95-101, 5, pp. 41-44) which we denote, as usual, by E{y | x) and which is 
uniquely defined except for events of zero probability, such that whenever f(x) 
is the characteristic function of an event F depending only on x (i.e. / = 1 when 
F occurs and/ = 0 when F does not occur), the equation 

(1) E\f(x)E(y | *)] = Etf(x)y] 

holds. Now if f(x) is a simple function, i.e. a finite linear combination of char¬ 
acteristic functions, it is clear from the linearity of expectation that (1) continues 
to hold. Quite generally, we shall prove 

Theorem 1 : The equation (1) holds for every function f(x) for which E\f(x)y] 
is finite. 

To simplify notation, we write E(z | x) = E x z for any chance variable z . The 
following corollary to Theorem 1 asserts simply that the operations E x and 
multiplication by f{x) are commutative. This fact, which is trivially equivalent 
to Theorem 1, has been stated by Kolmogoroff [5, p. 50]. 

Corollary: If E\f(x)y] is finite, then E x \f(x)y] = f(x)E x y. 

Proof of Corollary: If g(x) is a characteristic function, then E(gfE x y) = 
E(gfy) by Theorem 1. Since E x (fy) is unique, the Corollary follows. 

Proof of Theorem 1 : Since Theorem 1 holds when f(x) is a simple function 
and the product of a simple function and a characteristic function is a simple 
function, the Corollary holds when f{x) is a simple function. 

1 The author is indebted to M. A. Girshick for suggesting the problem which led to this 
paper and for many helpful discussions. 


105 



106 


DAVID BLACKWELL 


Now let f(x) be any function for which E(Jy) is finite. There is a sequence of 
simple functions f n (x) such that f n (x) —>f(x) and | f n (x) | < | f(x) |. For instance 
we may define f n (x) = m/n when m/n < f(x) < (m + l)/n, 0 < m < n*,f n (x) 
= m/n when (m — l)/n < f(x) < m/n , 0 > m > —n 2 ,f n (x) = 0 otherwise. 

We recall the following proposition of Doob [2, p. 296]: 

(2) \E x y\ <E x \y\ 

with probability one. Then, using the Corollary (for simple functions) and 

(2) , we have | f n E x y \ = | E x (f n y) | < E x | f n y | < E x | fy | . Also 

(3) E(f n E t y) = E(f n y). 

Since the two sequences of functions f n E x y y f n y are bounded in absolute value by 
the summable functions E x | fy | , | fy | , Lebesgue’s theorem [ 8 , p. 29] applied 
to (3) yields ( 1 ). 

In section 3 we shall use the fact that if u is a sufficient statistic for a parameter 
0 and/is any unbiased estimate for 0 , then E(f | u ) (which, since u is a sufficient 
statistic, is a function of u independent of 0 ) is an unbiased estimate for 0 . This is 
obvious, since it follows from the definition of conditional expectation that the 
two chance variables / and E{f\u) have the same expected value. The interest¬ 
ing fact is that the estimate E(f | u) is always a better estimate for than / in the 
sense of having a smaller variance, unless / is already a function of u only, in 
which case the two estimates/and E(J | u) clearly coincide. This is simply the 
fact that the variance of the regression function of / on u is not greater than the 
variance of /. In the case of Gaussian variables, where the regression is linear, 
this fact has been noted by Doob [ 1 , p. 231]. 2 Our statement is embodied in 

Theorem 2: If a 2 y is finite , so is a 2 E x y , and aE x y < ay, with equality holding 
only if E x y = y with probability one. 

Proof: Denote by m the common expected value of y and E x y. Suppose for 
the moment that a 2 E x y is finite. By the Schwarz inequality E\yE x y] is then 
finite. Then ay = E(y - m) 2 = E[{y - E x y) -f (E x y - m )] 2 = E(y - E x y) 2 
+ <r 2 E x y y since E[E x y(E x y — m)] = E\y(E x y — m)] by Theorem 1 . Thus ay 
exceeds a 2 E x y by E(y — E x y) 2 y which is positive unless y = E x y , i.e. y is a func¬ 
tion of x. Thus we obtain the usual decomposition: the variance of y is the 
variance of the regression of y on x plus the variance of y about the regression of 
y on x. 

To show that a 2 E x y is finite, we require the following 

Lemma (Schwarz inequality): If E{f 2 ) and E(g 2 ) are finite , then , with 
probability one , 

El(Jg) < E,tf)E x (g 2 ). 

A proof can be constructed on the usual lines by considering the function 
Q(x, X) = E x (f + Xflr) 2 . There are, however, certain measure-theoretic difficulties 


* For functions of finite variance it is possible to interpret conditional expectation as a 
projection in Hilbert space, when the statement becomes simply the Bessel inequality. 



CONDITIONAL EXPECTATION 


107 


in handling simultaneously the conditional expectations of the family of chance 
variables (/ + X{/) 2 ; instead we shall give a simple direct proof based on the 
ordinary Schwarz inequality for integrals. 

We may suppose / > 0, g > 0 with probability one, since, from (2), 

Elm <El(\f\\g\) 

with probability one. Unless the Lemma holds there are three positive numbers 
a, b, c with a > be for which the event 

{E x fg > a\ E x {f) < b, E x (g 2 ) < c) = H 

has positive probability. Then denoting by h the characteristic function of H 
and using the Schwarz inequality for integrals, we have 

aP\H) < E\hE x m\ = E\hfg) < E(hf)E(hg 2 ) 

= E[hE x (f 2 )]E[hE x (g 2 )] < bcP\H), 

which is impossible. This completes the proof of the Lemma. 

The Lemma, with / = y, g = 1, yields E 2 x (y) < E x (y 2 ) with probability one, 
which implies the finiteness of a 2 E x y and hence completes the proof of Theorem 2. 

3. Unbiased sequential estimation. Consider a chance variable z whose 
distribution depends on a parameter 0 . If we have an unbiased estimate t(z) 
and a sufficient statistic u(z) (not necessarily a single numerical chance variable) 
for 0 , then, as mentioned in section 2, v(u ) = E(t | u) is an unbiased estimate for 0 
depending only on u . 3 We have shown that the variance of v is never greater 
than that of t , and we shall see that it is sometimes much smaller (see example II 
at the end of this section). The estimate obtained in this section for the param¬ 
eter of a sequential process is of the v type; its importance lies in the fact that 
in many cases there is an unbiased estimate t (generally poor) which is a function 
of the first observation, and w hich will consequently be an unbiased estimate no 
matter what sequential test procedure is used. 

Let Xi , Xz , • • • be a sequence of chance variables w hose joint distribution is 
determined by an unknown point 0 in a parameter space. A sequential sample 
(test) [ 9 ] is determined by specifying a sequence of mutually exclusive events 
Si , S 2 , • • • , where Si depends only on Xi , • • • , x t and 

(4) £ P(S,) = 1 for all 6. 

t-1 

The event S t is that sampling stops after the ith. observation, and (4) ensures that 
sampling stops eventually. Thus if we define the chance variable n = i when Si 
occurs, n is the size of the sample. 

1 It was pointed out by the referee that, strictly speaking, u does not have to be sufficient; 
it is necessary only that t>(u) be independent of 0. The author is indebted to the referee for 
many valuable suggestions. 



108 


DAVID BLACKWELL 


Denote by u \, ih , • • • any sequence of chance variables such that U{ = 
Ui(x x , • • • , Xi) is a sufficient statistic for estimating 0 from x x , • • • , X {. There 
will of course be many such sequences {w»}, but it often happens that there is 
one which arises in a natural way from the sequential process; if we are sampling 
from a binomial population, for instance, Ui = number of defectives in the first i 
observations is a sufficient statistic. We shall suppose that the sequential test 
satisfies the following condition 

(6) S ( = + • • • + Si-0, 4 

where Wi is an event depending on Ui only. This condition means that when 
the tth observation is taken, the decision to stop at this point depends only on 
the zth sufficient statistic u x . For the binomial example mentioned above, this 
means that the decision to stop after i observations depends only on the number 
of defectives observed at that stage, and not on the order in which they were 
observed. The Neyman criterion for Ui to be a sufficient statistic [7, 10, p. 135] 
shows that ( 6 ) is no restriction whatever for the sequential probability ratio 
test [9] since the ratio in terms of which the test is defined will be a function of 
Ui only. 

Let t \, k , ■ • • be any sequence of chance variables such that U is a function of 
Xi , • • • , Xi ; define t — t x when Si occurs. If E(t) = 0 , t is said to be an unbiased 
estimate for 0 (relative to the particular sequential test {&<}). The theory of 
sequential sampling has been formulated primarily for testing hypotheses; a 
problem which arises naturally and often is the following: After a sequential 
sample has been obtained, is there an unbiased estimate for 0? Since a sample 
of constant size is a special case of a sequentially selected sample, we cannot 
hope to find unbiased estimates for arbitrary sequential samples unless such 
estimates exist for samples of every constant size. This is equivalent to the 
existence of a function t(x 0 for which E(t) = 0 for all 0. Our problem is to 
discover an unbiased estimate for 0 which, when n = i, is a function of Ui alone. 
Such an estimate has been found by Girshick, Mosteller, and Savage [4] for 
sequential samples from a binomial population. It turns out that whenever 
there is any unbiased estimate at all for a particular sequential test, there is 
also one of the type described. Thus, if there is an unbiased estimate t for 
samples of fixed size N, there will be an unbiased estimate of the type described 
for every sequential test requiring at least N observations, since t is itself an 
unbiased estimate for such sequential tests. 

Denote by t any unbiased estimate for 0 relative to a particular sequential 
test {$,*}. Denote by Wi , hi the characteristic functions of the events W ,, 
C(Si + • • • + Si) respectively, and define u = u t ,v = E{h^iU | Ui)/E(h^i [ u % ) 
when n — i. To justify the definition of v we remark that the event \n = i, 
EQit-i | u x ) = 0} has probability zero, since qh ^i < /&,_i with probability one, 
where q is the characteristic function of the event [E{hi-\ | u % ) > 0}, while 


4 For any event A, C(A) denotes the event that .4 does not occur. 



CONDITIONAL EXPECTATION 


109 


*»h) = E[qE{h- 1 1 tOl - E[E(h^i | w,)] - Jf(fcw). 

Since u x is a sufficient statistic for 0 with respect to x x , • • • , X {, v is a function of 
u and n only, independent of 0. The main result of this section is 
Theorem 3. v is an unbiased estimate for 0. 

Proof: We shall show that v = E(t \ u, n). This not only shows that t; is an 
unbiased estimate for 0, but also interprets v in a very simple way and, as men¬ 
tioned above, implies that the variance of v does not exceed that of L It must 

be verified that for every event D depending only on n and u> E(dv) = E(dt), 

00 

where d is the characteristic function of D. NowD = 2 E>S Xi a &d = D X S X 

*-i 

where D< is an event depending only on u x . It is sufficient, then, to show 
E(d x wjii-iv) = E(d x w x hi-\t)y where d t is the characteristic function of D x . Now 

E(djw t h x -iv) = E[d x w x h t -iE(h x -it x | u x )/E(h x - X | ti,)], 

using the definition of v. The function in brackets is h x -1 multiplied by a function 
of u x ; by Theorem 1 its expectation is unaltered if h x -1 is replaced by E(hi- 1 1 uj. 
Thus the right member of the last equality equals 

E[d<w x E(Ji x -it x | tO] = E(d x w x h x -it x ) == E(d x w x h x -it). 

We conclude with two examples: 

I. Binomial and Poisson distributions. Suppose x x , a 2 , ••• are inde¬ 
pendent with identical distributions, either binomial or Poisson, with parameter 
0. Then t = x x (= t x for all i) is an unbiased estimate for 0, and it is well known 
that u x = x i + • • • + x x is a sufficient statistic for estimating 0 from x x , • • • , x x . 
For any sequential test satisfying (6) our unbiased estimate for 0 will be 

E(h x -iXx \u x = u) = E(h x -iXif) 

E(hi-i | u x = u) E(hi-if) 

when n = i, u x = u f where / is the characteristic function of the event Ui = u. 
Then 

Z t) 

V = - 

Hk, (u,i) 

7-0 

_ ki(u, t) 

Z kj(u, i) 

i-0 

where kj(u t i) denotes the number of possible sequences x x , • • • , x x for which 
n>i,x i + • • • + x x = u, and x x = j. For the binomial case, this is the estimate 
found in [4]. 

II. Samples of constant size. We consider the special case where a 


for Poisson 


for binomial 



110 


DAVID BLACKWELL 


sample of constant size N is selected, X \, • • • , x N are independent with identical 
distributions, and the density function for x% has the form 

(7) p(x, 6) = r(e)s{e) wM q(x) 

considered by Koopman [ 6 ] 5 . Suppose further that there is an unbiased estimate 
t(x i) for 8 . These conditions will be satisfied, for instance, if 8 is the mean of a 
binomial, Poisson, or normal distribution, with w(x) = t{x) = x . Thenw* 
= w(x i) + • • • + w(x N ) is a sufficient statistic. Our estimate v becomes simply 
v = E[t(i ri) | u N \. Now E[t(x i) | u»] = • • • = E[t(x N ) | u N ], since u N is a sym¬ 
metric function of xi , • • • , x N , which are independent with identical distribu¬ 
tions. Consequently 

» - Kx,)/N I U w ] , 

so that 

a(v) < S(£ t(x,)/N) = SUxd/N. 


In the special case w(x) = t(x) — x , we have v == 53 x j/N> 011 r estimate is 

j-i 

simply the mean of the N observations X\ , • • • , x N . 

REFERENCES 

[1] J. L. Doob, “The elementary Gaussian processes,” Annals of Math, Stat., Vol. 15 

(1944), pp. 229-282. 

[2] J. L. Doob, “The law of large numbers for continuous stochastic processes,” Duke 

Math. Jour., Vol. 6 (1940), pp. 290-306. 

[3] J. L. Doob, “Stochastic processes with an integral-valued parameter,” Trans. Amer. 

Math. Soc ., Vol. 44 (1938), pp. 87-150. 

[4] M. A. Girshick, Frederick Mosteli.er, and L. J. Savage, “Unbiased estimates for 

certain binomial sampling problems, with applications,” Annals of Math. Stat., 
Vol. 17 (1946), pp. 13-23. 

[51 A. Kolmogoroff, Grundbegrijfe der Wahrscheinlichkeitsrechnung , Ergebnisse der 
Mathematik, Vol. 2 (1933). 

[6] B. O. Koopman, “On distributions admitting a sufficient statistic,” Trans. Amer. 
Math. Soc. } Vol. 39 (1936), pp. 399-409. 

,[7] J. Neyman, Giornale dell Istituto Italiano degli Attuari , Vol. 6 (1934), pp. 320-334. 

[8] S. Saks, Theory of the Integral , Stechert, 1937. 

[9] A. Wald, “Sequential tests of statistical hypotheses,” Annals of Math. Stat., Vol. 16 

(1945), pp. 117-186. 

[10] S. S. Wilks, Mathematical Statistics , Princeton Univ. Press, 1943. 

* It has been shown by Koopman [6] that if there is a sufficient statistic satisfying cer¬ 
tain regularity conditions, the density function for x must be of the form (7). 



THE DISTRIBUTION OF THE MEAN 
By E. L. Welker 
University of Illinois 

1. Summary. Both population and sample mean distributions can be repre¬ 
sented or approximated by Pearson curves if the first four moments of the 
population are finite. Using the al , 5 chart of Craig [2] to determine the Pearson 
curve type for the population, an analogous al , 5 chart is derived for the dis¬ 
tribution of the mean. This defines a one to one transformation of al , 8 into 
al , 5. The properties of this transformation are used to discuss the approach 
to normality of the distribution of the mean as dictated by the central limit 
theorem. This is facilitated by superposing on the a] , & chart the al , 5 charts 
for samples of 2, 5, and 10. 

2. Introduction. For any given distribution function of a population, a 
method is available for finding the distribution function of the mean, when it 
exists, that depends on characteristic functions and the Fourier integral theorem. 
For example, characteristic functions have been used to show that the arithmetic 
means of samples from a normal population is normal, and, with minor restric¬ 
tions on non-normal populations, that it is asymptotically normal. The method 
depends, of course, on a knowledge of the exact population distribution. 
Some authors have discussed the approximation of the distributions of sample 
means in special cases by one of the Pearson curves. It is the purpose of this 
paper to consider the complete range of Pearson curves as populations to be 
sampled, then to give the sampling distributions of the mean as approximated 
by the Pearson system, and to discuss the manner in which the distribution of 
the mean approaches the normal curve as dictated by the central limit theorem. 
Since the choice of a Pearson curve depends only on moment relationships, this 
will include the approximation of the distribution of the mean for any parent 
population as based on its moments. Both an algebraic and a graphic analysis 
will be given. 

3. Semivariant and moment relationships. Denote by a k the &th order 
moment of the population with zero mean and unit variance. Let X* be the kth 
order seminvariant of the population. Let a k and X* be the same parameters 
of the distribution of x, the mean of a random sample of size N drawn from this 
parent population. Using properties of the seminvariants of linear functions of 
variables independent in the probability sense, formulas relating these param¬ 
eters [ 1 ] are 

<4 - - c&N- 1 , 

a 4 - [« 4 + 3(2V - 1)] N~\ 

111 


and 



112 


E. L. WELKER 


4. The Pearson system of curves and the distribution of the mean. The 

determination of the Pearson curve will be made in accordance with the scheme 
discussed by C. C. Craig [ 2 ]. In this system the curve type is fixed by the 
moment 03 and the constant 

2o?4 — 3al — 6 

5 —<* + 3 • 

S 



The scheme for determining the type of curve is shown graphically in Fig. 1 in 
which the a\ , 5 plane is divided into areas in which the Pearson curve types are 
noted. The bounding 0 % , 5 curves are 

5= -l, 5=-J, 5 = 0, 5 = f, <4 = 0, 

aj = 45(5 + 2), and (2 + 3d)a\ = 4(1 + 25) z (2 + 5 ). 

Let 5 denote the value of the 5 function for the distribution of the mean. Then 

5 2a 4 — 3as — 6 

5 = a«”+ 3 • 




DISTRIBUTION OF THE MEAN 


113 


In terms of moments of the parent population 


5 = 


[■<*4 + 3 (N - I)] 0 al 

L N _ J 3 N 


a, + 3 (N - 1 ) 


- 6 > 


2a 4 — 3aa — 6 
04 + 3 + G(N -”l) • 


We see that 5 = 8 for N = 1 , and 8 < 8 for N > 1 . Both 5 and 03 approach 
zero as N approaches infinity. These are the values of the constants for the 
normal function. This result is expected from the central limit theorem. 


5. The 03 , 8 diagram for varying sample size. For every given population 
with finite moments of orders 1 through 4 there exists a Pearson curve represent¬ 
ing or approximating its distribution. This determines a point in the al , 8 
plane. For a given sample size, N , there corresponds a point in the al , 8 pfane. 
If the point (a? , 8) is now plotted on the al , 5 plane, we can determine the type 
of Pearson curve which is needed to approximate the distribution of x. The 
transformation of al , 8 into al , 5 enables us to analyze the relationship between 
population distributions and distributions of x . The transforms of the boundary 
curves in the al , 8 plane will constitute an al , 8 chart corresponding to the one 
for al , 8 shown in Fig. 1 . In studying the approach to normality of the dis¬ 
tribution of x , it is illuminating to superimpose this a 3 , 8 chart on the al , 5 chart. 
In order to do this, it is necessary to make certain algebraic changes in the 
equations. 

First eliminate a 4 from the formula for 5 as follows. From 


8 = 


2a 4 — 3&3 — 6 
+ 3 


we find 


a 4 = 


35 + 3al + 6 
2 — 5 ' 


Substitute this in the expression for 5. Then 


a _ 2a 4 — 3al — 6 _ 8(al -j- 4) 

a 4 + 3 + Q(N - 1 ) al + 4 + 2 (N - I )(2 - 5) * 

This formula, in conjunction with 

al = Nal 


enables us to write the transformations of the boundary curves. 


Boundary Curve Transformed Curve 

s _ 1 5 _ —(Nal + 4) 

Na\ + 4 + 6(AT - 1) * 

-(Nal + 4) 

2(Naj + 4) + 10(AT — 1)' 
5 = 0. 


5=0 



114 


E. L. WELKEK 


._ 2(Nal + 4) 

« = # 5 “ 5 (N% + 4 ) + WW ~ L) ' 

<4 = 4S(S + 2) allNal + 4 + 2S(N - l)f 

= 4l(a\ + 4)[6(tfal + 8N -4) + 2Na\ + 8]. 
(2 + 3S)ai = 4(1 + 25) (2 + 5) [5(16iV + 3Nal - 4) + 2N&1 + 8] 

[iVa, + 4 + 2S(AT - 1 )]*Na\ 
= 4[«(2JV& 2 , + lOiV - 2) + Nal + 4][5(AT«1 + 8N - 4) + 2 Na\ + 8]. 


S 



Fig. 2. The a \, 6 and aj, 5 Charts. N «* 2 


Fig. 2 shows the chart for distributions of x for N = 2 by dashed curves 
superimposed on the chart for the population shown by the solid curves, and 
Fig. 3 consists of the same curves for iV = 5 and N = 10. The intervals on the 
population values are 0 < al < 12 and — 1 < d < .4 in Fig. 2, but only part 
of the a\ range is shown in Fig. 3. In each case the curves for the distribution 
of x cover the interval for a 3 , 5 which corresponds to the entire interval shown 
for the population in Fig. 2. Population curves are identified by capital letters 
and the corresponding curves for the distribution of x by the corresponding lower 
case letters. 



DISTRIBUTION OP THE MEAN 


115 


Before discussing the Pearson curve relationships disclosed by these graphs, 
let us analyze some of the geometric properties of the transformation itself. 
Let N be considered as the parameter defining families of curves in the a %, 8 
plane corresponding to a z = constant and 5 = constant, the systems oi lines 
parallel to the coordinate axes. The transform of a] = k is a z = k/N, a system 
of lines perpendicular to 8 = 0 , and approaching a z = 0 with increasing N at 
the rate kN~ 2 . The line a] — 0 is invariant under the transformation, but it is 
not pointwise invariant. 


S S 



Fig. 3. The a \, 5 and a], 8 Charts 


The transform of 8 = C is 

C(Nal + 4) _ 5 _ C(al + 4/jRQ_ 

6 " N&l + 4 + 2 (N - 1)(2 -C)’ a\ + [4 + 2 {N - 1)(2 - CW*\ 

Solving for a\ , this becomes 

. 2 4 C - 8[4 + 2 (N - 1)(2 - C)] 

“ NQ - C) 

Except for the straight line 8 = 0, obtained when C = 0, this is a system of 
rectangular hyperbolas with asymptotes 



116 


E. L. WELKER 


al = ~[4 + 2 (N - 1)(2 - C )]#” 1 and 8 = C. 

We are concerned only with the range al > 0 . Hence 
— [4 + 2(N — 1)(2 — OlAT 1 

must be positive for the asymptote to show on the diagram. Since | 8 | < 2 , 
and thus | C | < 2 , the expression in brackets is necessarily positive. Hence 
the vertical asymptote is always outside the range of interest and will not show 
on the diagram. However the horizontal asymptotes, 8 = C, do appear in all 
cases. The hyperbolas are concave downward if C > 0 and are concave upward 
if C < 0 . 

Lines of the pencil 8 = mal are transformed into the hyperbolas 

Z — ma \{ Na\ + 4) 
a\ — 2mot\(N — 1 ) + 4 

for N > 1 . It is clear that ( 0 , 0) is the only invariant point. Every point on 
5 = mal is transformed into a point closer to the origin, the square of the distance 
from the origin changing from 

(m 2 + 1 )«J to . (m 2 + l)aSAT 2 . 

It is easily verified that the hyperbolas are asymptotic to 

* _ mNal (N - 1)(1 -f 2m) , .2 __ -4 

1 - 2 m(N - 1 ) [1 - 2 m(N - l )] 2 “ 8 1 - 2m(N - 1 ) 

As N approaches infinity, these asymptotes approach 

-2 

8 = — ~ and al — 0 . 

JU 

An area in quadrant one (four) in the al , 5 plane is transformed into an area in 
quadrant one (four) in the al , 8 plane. The transformed area is nearer the 
origin. 


6. Types of Pearson curves for distribution of sample means. Examination 
of the graphs in conjunction with the above described properties of the trans¬ 
formation shows the following facts regarding the distribution of means of 
samples drawn from populations identified by al and 8. First consider the 
normal function and the three main Pearson types only. 


Parent Population 
Normal 
I L 

h 

lu 

IV 

vu 

Yh 


Distribution of Sample Means 
Normal 
I L 

h and I L 
lu , 1 / and Il 
IV 

VI L and IV 
VI/ , VI L and IV. 



DISTRIBUTION OF THE MEAN 


117 


The transition types were disregarded completely in the above analysis. It is 
worth noting that, disregarding type X, III is transformed into III, VII into 
VII, Up into II L , never into ll v , V into IV, but never into V. Type X is 
transformed into type III, never into X. Others follow a similar pattern. 

These moment relationships on the distribution of the mean are not sufficient 
conditions in general. In special cases they are, for example the normal dis¬ 
tribution and the type III (see [3]). They do represent the best approximation 
curve as specified by the Pearson system. We know that in some cases, for 
example type II (see [3]), the distribution of means is not described by a Pearson 
curve. It is clear, however, that the approach to normality is indicated ana¬ 
lytically by the transformation al , 5 to al , 8 and is shown graphically by the 
a% , l diagram. Skewness and kurtosis in the parent population are reflected 
in the distribution of the mean in small samples. A symmetric distribution 
of the mean requires a symmetric parent population regardless of sample size, 
but the degree of skewness decreases rapidly with an increasing number in the 
sample. The Pearson curve which approximates the distribution of x from a 
bell-shaped parent population is also bell-shaped. The Pearson curve approxi¬ 
mating the distribution of x for samples of N = 10 (Fig. 3) is bell-shaped for 
any parent population with values of al and 8 within the intervals considered. 
For samples of 5 in the same range the approximating curve is either bell-shaped 
or J-shaped, but it is never U-shaped. For samples of 2, even the U-shaped 
distribution is possible, but only with extreme values of al and 8 . The point in 
the al , 8 plane corresponding to the normal curve is the only invariant point in 
the transformation. Hence parent populations with parameters not satisfying 
al = 8 = 0 cannot yield normal distributions of sample means. 

REFERENCES 

[1] T. N. Thiele, “The theory of observations”, Annals of Math. Stat.,V ol. 2 (1931), p. 206. 

[2] C. C. Craig, “A new exposition and chart for the Pearson system of frequency curves”, 

Annals of Math. Stat ., Vol. 7 (1936), pp. 16-28. 

[3] J. O. Irwin, “On the frequency distribution of the means of samples from a population 

having any law of frequency with finite moments, with special reference to 

Pearson's type II”, Biometrika , Vol. 19 (1927), pp. 225-239. 



NOTES 

This section is devoted to brief research and expository articles on methodology 
and other short items. 

ON THE STUDENTIZATION OF SEVERAL VARIANCES 


By B. L. Welch 
University of Leeds , England 


1. Introduction. In a recent paper [1] the author considered the problem 
of eliminating several variances simultaneously from probability statements 
concerning the mean of a normally distributed variable. The general situation 
envisaged was as follows. We supposed that we had an observed quantity y 
which could be assumed to be normally distributed about a population mean 

k 

t\ with variance <r\ = 2 X t -o\ , where the X< are known positive numbers and the 

»—i 

a\ unknown population variances. It was supposed further that the data 
provided estimates s] of the <r] based on /< degrees of freedom, and having the 
sampling distributions 


( 1 ) 



and that these estimates were distributed independently of each other and of y . 
The problem was to make statements about the magnitude of the difference 
y — t) which would involve explicitly only the observed variances s ]. The 
probability of the truth of the statements was also to be entirely independent 
of the population values a] . 

The solution was given implicitly in a formal mathematical expression and a 
general process of developing successive terms in a series expansion was de¬ 
scribed. In the present communication a slightly different way of reaching this 
development is provided. 

2. General method. If the /,• are large enough the ratio 


(2) v = 

can be taken to be normally distributed with mean zero and standard deviation 
unity. This suggests that, when the /, are not necessarily large, we might 
approach the matter by seeking some other function 

(3) * = ••• ,sl,y - v\ 

which will still be normally distributed with the same mean and standard 
deviation. We shall see that such a function can be found, although the method 
to be followed leads us first to another expression 

118 


y - v 

V'sXiSj 



STUDENTIZATION OF VARIANCES 


119 


(4) 


y — rj = h(8l , sl , • • • , 8k , X) 


which is simply the transposed form of (3). Once we have obtained h we can 
solve out from (4) to obtain x. 

Since the distribution of y is independent of s) we have 


(5) P(„l,')d, - ^==3 e*p{-i ■ 

Transforming therefore to the new variable x we have for given s] 

v(x I s 2 ) dx = — r- 1 — cxd /- 1 ^ x) 1 dh ^’ x) dx 

(6) p 1 ’ V2^sx7^ cxp l 3 sx^f / -~Tx~ dx 

= j{s\ x, 2Ai(Ti| dx (say). 


The umestricted distribution of x is then obtained by averaging over the joint 
distribution of the s\ . In order that x should be a unit normal deviate we must 
therefore have 


(7) P( x ) = J ■■■ f j\s\x, sx.cr-} II {p(s") <*»<! = e li2 - 

,2 


We have to substitute from (1) and (G) into (7) and then choose the function 
h(s 2 , x) in such a manner that the equation is satisfied whatever may be the 
values of the unknown a\ . To evaluate the function by the methods of numeri¬ 
cal integration is probably impracticable except perhaps in some simple special 
cases. A series development is, however, quite feasible. 

Symbolically we can write 

(8) j {s 2 , x, (2X,<r<) J = e s< ’^ a, ‘ ,a * j\w, x, 2X,a';) 


where di denotes differentiation with respect to w t and subsequent equation to 
. Equation (7) then integrates out to give 


(9) 


II 


p~ 



j{w, x, £X,a 2 j 


vW e 


w 


i.e. 


(10) 


&j\w, x, SX.<7 2 ) = e * x * 


(say)- 


The operator 0 must be expanded in powers of before it can be interpreted 
When this is done we find 


(ID 

( 12 ) 


{ 4 .2 6 ^3 8.4 > 

1 tt + ’ tt + 7T + '"j 

- 1 + S^ + {tz”-l£ ! + i(si£)’} + . 



120 


B. L. WELCH 


Our procedure now is to find successive approximations to h(s\ x). It will 
be convenient to denote by h r (s*, x ) an expression which equals h(s 2 , x) to terms 
of order 1//J • Further let c r+ i(s 2 , x) be a corrective term which when added on 
to h r ($* y x) will give a result correct to terms in 1//S +1 . Then to this order we 
shall have from (6) 


(13) 


V2rj{w, x, SX.,i) = 6 X p{-* 


+ 


_L_„ XD / i x\XhWi) \ J dCr+ijw, X) _ x(2X.tflQCr+l(«>, x)\ 

■s/sxT^ P l 2 2X«r? /\ dx ZX,<r 2 . / 


remembering that the leading term in h(w, x) is xy/ ZX.-w,-. 
Hence from (10) we find 


(14) 



, hl(w, x)\ dh,(w, x) 

2 ~2X«^ / dx 

. 1 v4*‘ f dCr-)-i(<r 2 , x) 

V2X.<7?. \ dx 


XC r ^] (<T , x) 






i.e. 


fl K\ ^ Cr+lfa 2 * *)\ i 1 h r (u>, %)\ _1- dhr(w, x) -*** 

Given h r we can therefore proceed directly to c r +\ and hence to h r +i . 

3. Application to give terms in 1 //,■. It will be sufficient illustration of the 
method, if we show here how to obtain hi from h 0 . We have from (15) 


(16) ^ le~* T * c i( <r> » a5 ) \ 4 . Ji _i v ^Acxd/— x j/( 2 Xt Wi) 

(16) *4* V2x^/ + t 1+2 7r/ exp \ 2^)/r ( ^)- 


4 x» 


l.C. 

(17) 


£ f-Jc» Ci(g 2 ,x) \ 

*4 VzXicr'i / 


, (2 X.V.V/O ,? 

+ i “ p 




where d now denotes differentiation with respect to u and subsequent equation 
to unity 


i.e. 


(18) 

(19) 

whence 


a f ir , Cl (a\ x)\_ (2xVV/<) 1 -Jx« fl . o , _4s 

*4 Vzx^J - (1 + 2x x) 


1 (ZXVrV/Q 1 
4 (ZX«r 2 )* ^ 



(X + X s ) 


} 


( 20 ) 


Ci(a s , x) 



(1 + x 2 ) (ZXVV/i) ] 

4 (ZXct 2 )* J' 



8TUDENTIZATI0N OF VARIANCES 


121 


Hence to the terms in l//< we have 


( 21 ) 


y - v = h(s 2 , x) = X 



, (1 +J?) (SX&Z/*) ] 
^ 4 (SX,s 2 ) 2 J ’ 


Solving this out for x we obtain to the same order 


( 22 ) 



0_+tr) (SX^V/Ql 
4 (SX*s 2 *) 2 J' 


where v equals (y — i?)/\/ 2 X»s^ . To order l// t we may regard x as a unit normal 
deviate and hence determine the probability level corresponding to the observed 
ratio v . On the other hand if we wish to determine the value of y — 17 which 
will lie on a given percentage level the expression ( 21 ) is the appropriate one 
to use. 

4. Further discussion. The present development is of course basically 
equivalent to that given in the previous paper. Indeed if we integrate (10) or 
(15) out with respect to x we arrive immediately at the formulae which were then 
obtained and which were illustrated by calculating terms to order 1 //*. In 
fact when calculating higher order terms it seems best to do this integration 
before carrying out the operation ©. The object of the present note is really to 
stress the fact that we are simply finding a function of the observations and of 
y — r\ which is distributed as a unit normal deviate, whatever the values the 
true <j\ may chance to possess. 

Finally, the remarks following equation (7) above should be somewhat ampli¬ 
fied. The equation asserts that the distribution of any arbitrary function x, 
defined by (3), is 


(23) „(.) - /■;■/ 




1 h'Hs 1 , x) l dh(s 2 , x) 

2 J dx 


II Mds*}, 

i 


where h(s , x) is the function obtained by solving out (3) for y — 17. On carrying 
out the integrations in (23) we shall in general obtain p(x) as a function of x and 
<t\ . Our argument is that if h be chosen properly the <r 2 will disappear from 
p(x), and x will appear only in the form of the unit normal probability function. 

To find h{s , x) by a direct process of numerical integration would appear to 
involve in the first instance the choice of a net-work of points for x and s] . 
Suppose the range of x is covered by n* points and the range of s 2 by n,- points. 
We may then as an approximation look on our task as that of finding the (n*ir,Wi) 
values of h(8 2 , x) corresponding to this network. Since (23) is to be true for all x 
and a 2 , we can take in turn n< values of <r 2 , and then (23) can be replaced by 
(w*ir<n<) simultaneous equations (it would be necessary to use some formula 
expressing cM(s 2 , x) /dx in terms of values of his 1 , x) at discrete values of x or 
conceivably this may be avoided if we work with the integrated form). With 
a proper choice of the points for x , s *, and <r\, we might expect to evaluate the 
series h(s 2 , x) to any required degree of accuracy, but clearly as a general process 
to be used over a whole range of values/* this approach would be too laborious. 



122 


FfiLIX CERNUSCHI AND LOUIS CASTAGNETTO 


It may indeed be queried whether theoretically, with an indefinitely fine 
network of points, we shall be led to a unique function h{s 2 , x) with the common 
sense properties, which, from general statistical considerations, we know it 
should have in order to be acceptable. As with integral equations of a simpler 
character, the passage from a discrete network to a continuum may raise prob¬ 
lems, but it is the author’s opinion that the infinite ranges of x and s] give us the 
freedom which we require in the solution. 

The author, however, prefers to approach the problem from the numerical 
behavior of the series, of which (15) gives the general terms. Here the practical 
issue appears to be to investigate the relation between the magnitude of the last 
term retained and the /,•. The author hopes in a further paper to give some 
results of an investigation of this character and also some tables facilitating the 
calculation of h(s 2 y x). 


REFERENCE 

[1] B. L. Welch, “The generalization of ‘Student's’ problem when several different pop¬ 
ulation variances are involved , \ Biometrika, Vol. 34 (1947), pp. 28-33. 


PROBABILITY SCHEMES WITH CONTAGION IN SPACE AND TIME 1 

By F£lix Cernuschx* and Louis Castagnetto 
Harvard University 

1. Summary. In many natural assemblies of elements, the probability of 
an event for a given element depends not only on the intrinsic nature of that 
particular element, but also on the states of some or all of the rest of the elements 
belonging to the same assembly. On the basis of this general idea of “contagion” 
some urn schemes are developed in this paper in which one has contagious 
influence in space and time. The most interesting result found is that in general 
the points of convergence of the probability of the assembly are given by some 
of the roots of an equation p = f{p) and that some of these roots, between zero 
and one, represent stable states of the assembly, or points of convergence, and 
others represent unstable ones, or points of divergence. The two neighboring 
roots, (if they are single), of a root representing a point of convergence are un¬ 
stable values of the probability. Consequently, under certain conditions, the 
limiting probability may be made to have a finite jump by changing the initial 
probability by an arbitrarily small amount. The concrete cases developed in 
this paper can be considerably extended by similar methods by assuming more 
complicated and general assemblies and laws of contagion. 

1 On the suggestion of the referee, some parts of the original paper were deleted and 
some mathematical simplifications were introduced. 

* Research Associate at Harvard Astronomical Observatory and Guggenheim Fellow. 



PROBABILITY SCHEMES 


123 


2. Introduction. In the known probability schemes of contagion of Eggen- 
berger and Polya [1], Greenwood and Yule [2], Liiders [3], Neyman [4], Feller [5] 
and others [6], as well as in Markoff chains different ways are considered in 
which the previous results in a definite series of trials may influence the proba¬ 
bilities of the future ones. All of these schemes consider possible influences of 
the results of the different trials along the time axis; and consequently might 
be called schemes of contagion in one dimension and one direction. 

In many natural assemblies of individuals or elements, the probability of an 
event per individual or element depends not only on the intrinsic nature of the 
considered element but also on the states of the rest of the elements belonging 
to the same assembly. 

The purpose of this paper is to develop some simple schemes with urns in 
which there is a contagious influence in space and time and to show some of their 
consequences. The method which we have used to treat certain concrete cases 
could be applied to more complicated assemblies and laws of influence in space 
and time. 

3. Scheme of a closed assembly of urns in two dimensions. Let us consider 
a set of N urns arranged on a closed surface in such a way that each one of them 
is surrounded by m others. Let each urn contain a finite number of black and 
white balls. In this paper the probability associated with an urn will refer to 
the probability of obtaining a white ball if a single ball is drawn at random from 
the urn. We shall assume that the initial probabilities are equal for all of the 
urns and that the following law of influence holds: When, after a collective 
trial, one finds that the ball drawn from a certain arbitrary urn, taken as the 
central one, is white and that the corresponding results of the m surrounding 
urns give l white and s black balls, one multiplies the probability of obtaining a 
white ball out of the central urn by the factor a{,ial (2 ; if the ball drawn from the 
central urn were black, without changing the given results of the surrounding 
urns, one multiplies the considered probability by the factor a l 2 t2 ai ,i. Under 
the specified conditions, it is easily seen that the probability of obtaining a white 
ball from a definite urn at the i + 1 trial will be, by considering all the possible 
alternatives: 

P<+i = m! J2 - 1 —T-. [p^iaur'feaw)' + Pi 

( 1 ) — j)l 

= f(p%) = p 2 i(Pi<* 1.1 + q%OCi,2) m + Piqi(piOt2,i + QiCt 2 t2 ) m , 

where: 


Pi + qi = 1. 

Consequently p, either converges to a root of the equation p — f(p) or tends to 
infinity. As a probability greater than one or smaller than zero has no meaning, 



124 


F^LIX CERNUSCHI AND LOUIS CASTAGNETTO 


we have to study the function y = /(p) between zero and one. In (1) we have 
given an implicit form for y = f(p) , corresponding to a particular case of influ¬ 
ence; by changing the law of influence we change the function/(p). In general 
one can find graphically the roots of equation p = f(p) by plotting y = /(p) and 
y = p and by determining the intersections of these two lines in the range 
0 < p < 1. Later we shall give the values of these roots for some concrete 
examples. From what we have shown it follows that if, for the considered 
assembly of urns and for especially chosen values of the parameters of inter¬ 
connection and initial probabilities, the probability tends to some equilibrium 
value, this must be a root of the equation p = /(p). As we shall see later, the 
roots in the range 0 < p < 1 may represent stable or unstable states of the 
assembly. 

Let us consider now a general method for finding the explicit form of the 
function/(p) corresponding to laws of influence similar to the one used by Polya. 

Assume that the trial i results in the drawing of l white balls and s black balls 
from the m urns surrounding the central one. Then we add Iwi white and 
sbi black balls to the central urn if the result of the central urn was white, and 
Iwt white and sb 2 black balls if it was black. It is easy to show that under these 
conditions the probability in the trial i + 1 is related to the probability in 
trial i by the following formula: 

p. +1 = Vi [' \h ~ tT^Pi <T + Qi 4rl clt 

^2^ *'° L vti * 

+ (1 - Pi) f h* t 7 *(pi tv + qi dt 

where and Ni are the number of white balls and the total number of balls, 
respectively, in the central urn before trial i. Relation (2) permits us to study 
several interesting schemes. It is easy to see that all the possible schemes which 
can be represented by relations of type (2) give only values of the probability 
in the interval zero and one; and consequently we do not need to make the 
restriction in the analysis of the equation p = /(p) that was necessary in the 
previous scheme, represented by equation (1). 

For the case W\ = bi = Ci, w 2 = b 2 = c 2 , we obtain from (2) 


(3) 


„ „ Wi + mp t c i , „ x 

PiH = Pi a — — b (1 “ Pi) 


Ni + mci 

If Ci = C2, (3) gives 

<4) Pi+i = pi. 

If one takes ci = kiN { an'd c 2 = faNi (3) becomes 


Wi + rnczjpi 
Ni + mc2 


<5) Pi+l = * + (1 - P<) !*£ = /( P< ) 

and the equation p = /(p) has, in this case, the roots 0 and 1. 



PROBABILITY SCHEMES 


125 


When w x = b 2 = kxNi and h — w 2 = one has to replace h(d/dtx) by 

l'z{d/dtz) in the second term of (2); then if we take m = 2, 


/p\ m _ P* "b 2ki m „ / Pi 

(G) Pi+i - 1 Loh + Vx 


1 + 2kx 

In particular, if ki = fe = k, one obtains 


. o _J^+A_ 

2kz 1 + k\ + ki 


- 3 


p% + 2k 

1 + 2ki 


;)= 




(7) pi+l = nr^ l * kpi< ~ (4fc _ 1)p< + 2k] = f(Pi) ’ 


and the solutions of the equation p — f(p) are p — \ and 1 . By considering 
the behavior of y = f(p) one finds that the stable solution is given by the root \\ 
consequently if one starts with any value of 0 < p < 1 the probability tends 
to the limiting value If k\ = 0, k% 5 ^ 0, by simple calculations, one obtains 
from ( 6 ) that the solutions of p — f(p), in this case, are zero and one. 

The equation p = /(p), as given by (6), always has the solution 1 . In order 
to have the other two roots real, one has to satisfy: 

Jfci(l + 2 fe) (2 + h + 3 fe) 2 > 4(1 + fci + h) 

( 8 ) 

[(fci + + 2(fc, - h) - 4 *S]. 


A simple and interesting application of relation (2) is for the case of two urns, 
characterized by m = 1. From (2) we obtain: 


where 



Pi + k s 
1 + fc 



== k\Ni , bx = , w 2 — k*N <, 6 2 = kiNi . 

The equation p — f(p) , as given by (9) has the roots 0 and 1; and one may fix 
the value of the third root by conveniently choosing the values of the parameters. 

Applying (2) for an arbitrary value of m and integrating by parts, it is seen 
that in general the equation p = f(p) is of degree m + 2 and consequently, by 
choosing appropriate values for the parameters k\ , hi , ki , , each of which may 
be between —1 and 00 , one can expect several roots in the range 0 < p < 1. 
One can easily generalize our relation (2) for cases in which Wx, w 2 , bx, b 2 are 
given functions of the probability pi . Even in this most general case it is simple 
to see that one would have a recursion formula of the type p *+1 — f(pd and, as 
in the elementary cases which we have considered, the points of equilibrium of 
the closed assembly of urns will be given by those solutions, in the range 0 < p 
< 1, of the equation p = /(p), where the derivative of y = f(p) is negative. 
Consequently the two neighboring roots, if they are single, of a root representing a 
point of convergence are unstable values of the probability. Therefore, under 
certain conditions, the limiting probability may take a finite jump if the initial 
probability is changed by an arbitrarily small amount. This is, we think, the 
most important consequence of the contagion schemes that we propose. We 



126 


F&L1X CERNUSCHI AND LOUIS CASTAGNETTO 


consider that many actual cases of contagion could be better understood by 
schemes of the type that we are studying. 

Let us consider now some simple cases of relation ( 1 ). If we take 


ati'i = <* 2,2 = <*i <*i ,2 = <* 2 ,i — <*2 and m = 2 , 

representing a closed ring of urns, one obtains: 

P %+1 = P»(«iP* + o 2 q % ) 2 + Piqi(ot 2 pi + «i?») 2 

= Pi + (p< ~ P») [(<*i + <*2> 2 — 4 a?] = /(p,). 

The equation p = /(p), corresponding to this recursion formula, always has the 
solution p = 0. The other two solutions are given by 


(ID 




These roots will be between 0 and 1 when 

2 < <*i + <*2 

1 > <*1 < <*2 

We would have Pi > 0 and P 2 < 0 if 

2 < C*l + <* 2 

1 < <*1 < <*2 

and Pi = P 2 when 

(14) <*i + <* 2 = 2, <*i 5*^ 1. 


( 12 ) 


(13) 


or ( 12 ') 


or (13') 


2 > <*i + a 2 

1 < <*i > <* 2 

2 > c*i + a 2 

1 > «i > <*2 , 


Let us now study the general behavior of (10). For the conditions ( 12 ') we 
have: 

( 15 ) p i+ i - Pi = ap,(p% - Pi) (Pi - Pi) 

where s ! = 4 a* - («i + ai) 2 > 0. 


If 0 < Pi < Pi, one obtains from (15) by use of elementary algebra: 


(16) 


Pi+i ~ Pi 1 

Pi — Pi I 


= a'Pi | (Pi - pd | 



< 1 . 


Consequently if pi > P 2 the sequence p t - increases monotonically. Otherwise 
p,+i will lie between Pi and p, and will tend to Pi without ever reaching the other 
side of this point. In a similar way it is possible to prove the convergence to a 
constant for the most general equations of the type p = /(p) when they have 
roots between zero and one. 

Let us give some numerical results. For <*i = 0.95 and <* 2 = 1.1, from (10) 
one obtains: Pi = 0.1 and P 2 = 0.9. It is easily seen that, in this case, if 



PITTING CURVES 


127 


0 < pi < 0.1, the limiting value of p, will be zero; if pi > 0.1, the limiting value 
will be 0.9. The interesting point is that if the initial probability is in the 
neighborhood of 0.1, an infinitesimal change in its value may produce a finite 
change in the stable limiting probabilities; and that for the initial probability 
equal 0.1 one would have an unstable equilibrium of the system. This con¬ 
sideration shows why it is important to know how the probability converges 
towards a certain point. As we have previously shown, the points of con¬ 
vergence are roots of the eq. p = f(p) but there roots which are not points of 
convergence. 

Similar reasoning could be applied to more complicated systems belonging to 
our general scheme of contagion. Consequently, the most important result is 
not that the considered assembly may have a probability tending to some value 
in the range 0 < p < 1, but that under certain conditions the limiting probability 
may jump from one value to another by changing the initial probability by an 
arbitrarily small amount. 


REFERENCES 

[1] F. Eggenberger and G. PoylX., Zeits. fur Ang. Math, und Mech. Vol. 3 (1923), p. 279. 

[2] M. Greenwood and G. U. Yule, Roy . Stat. Soc. Jour., Vol. 83 (1920), p. 255. 

[3] R. Luders, Biometrika , Vol. 26 (1934), p. 108. 

[4] J. Neyman, Annals of Math. Stat ., Vol. 10 (1939), p. 35. 

[5] W. Feller, Annals of Math. Stat., Vol. 14 (1943) p. 389. 

[6] F. Cernuschi and E. Saleme, Anales Soc. Cientifica Argentina, Vol. 138 (1944), p. 201. 


FITTING CURVES WITH ZERO OR INFINITE END POINTS 

By Edmund Pinney 
Oregon State College 

The problem of determining a suitable equation to fit an empirically deter¬ 
mined curve over a given interval has been of great importance in statistical 
work, in experimental science, and in engineering technology. Since infinitely 
many types of equations may be made to fit the data with required accuracy, 
the choice of a “suitable” type of equation depends on the qualitative nature 
of the empirical curve, on the use to which the equation is to be put, and upon 
considerations of simplicity. 

As a function type, the polynomial has, because of its simplicity, been enor¬ 
mously useful. The function type studied here is a little more general than the 
polynomial type, being particularly useful in the case of empirical curves that 
become zero or infinity at one or both ends of the interval. 

Without loss of generality the interval in which the equation is to fit the curve 
may be taken as 0 < x < 1. It is assumed that, by numerical means or other- 

f 1 

wise, a finite set of moment Mm = / ytf" dx may be computed, y being the 

Jo 

ordinate of the empirical curve. 



128 


EDMUND PINNEY 


The problem to be considered here is that of determining a function fix) of 
the form 

(1) fix) = *“(1 - xf il P a t x p , R(a) > -1, R(f}) > -1 

0 

such that 

( 2 ) [ f(x)x m dx = Mm 

Jq 


as m ranges from zero to the number of the highest moment known. f(x) is 
then an approximation to y which may be written 

(3) y ^ fix). 


Theorem 1°. Given a finite set of moments mo , Mi > Ma , • • • , m» > and given that 
R(a) > —1,R(P) > — 1 , define 


(4) 


( 5 ) 


S,(«, P) 


Tj p + a+1) f* /p\ r(m + p + « + g + l) 
r(p + « + j 8 + 1 ) o \mj r(m+a+l) V ; Mm ’ 


„<») - 
a k = 


(-)* 

k\T(k + a + l) 



( 2 p + <* + /} + l)r(p + A + « + i8 + 1) 
(p - &)!r(p + /3 +1) 


&*(«> £)> 


( 6 ) 


f(x) = x"(l - a :)' 5 a* n) a; 4 - 


Then fix) will satisfy ( 2 ) /or m = 0,1, * • • , n. 

2 °. //, m addition to 1 °, m«-hl is known and a and $ satisfy 


(7) 


&n+l(a, 0) = 0, 


then fix) will satisfy (2) for m = n + 1 also. 

3°. //, m addition to 1° and 2°, fi n +2 is also known , and if a, (3 also satisfy 

(8) A$n+2(a, 0) = 0, 

then fix) will satisfy ( 2 ) for m = n + 2 as well. 

Proof. Let Pm'^iz) be the Jacobi polynomial of order m defined in terms 
of the hypergeometric function by 

(9) = ( m ^ “) F(-m, m + « + f} + 1; « + 1; * - J*). 

Let Pm a, ^ ) (l — 2/n) symbolically represent the expression gotten by substituting 
M* for x k in the expansion of the polynomial P„ ,0) (1 — 2x) . There exist numbers 
A m , q such that 


( 10 ) 


x 


= %A m , q P^\ 1 - 2x). 



FITTING CURVES 


129 


Also 

(11) *» = Z, 1-2*). 

o 

For R(a) > — 1 , R(0) > —1, define 

fM _ T a /i _ V* (2p + a + 0 + i)pir(p + « + /» + » 

( 12 ) /( ) ( } o p r(p + a + i)r (p + 0 + 1) 

X P ( p aM ( 1 - 2 M )P ( p a - m (l ~ 2x). 

Then by ( 10 ), for m = 0 , 1 , • • • , n, 

£ - fr ( 2 y uv.ys ,, +; + + .) +1) •«* - 2 “> 

X E,4. t ['*“(1 - z/P^U - 2o5)Pp <,,w (l - 2x) <&. 
0 Jo 

By the orthogonality of the Jacobi polynomials, [ 1 ; §4.3], 

[ 1 f(x)x m dx = jl P A m . r P^' n (l - 2 m). 

Jo 0 

By (11), 

f f(x)x m dx = Mm, (m - 0 , 1 , • • •, n). 

Jo 

It follows f rom (2) that/(a:) as defined in ( 12 ) isthe/(x) of ( 1 ). It remains to be 
shown that (12) may be expressed in the form (4)-(6). 

From (9), 

r(p + a + 1 ) 


pf^a - 2 x) 


(13) 


r(p + « + 0 + 1 ) 

(-)" r(p + m + a + 0+l) 
o ml(p - m)! r(m + a+l) 


Pf’"tl - 2 m) = ^ SM 0)- 


so by (4), 

(14) 

Inserting (13) and (14) into (12), 
f(z) - x (1 x) E, r(p + p + 


X±* 


(—)* r(p + ft + a + fl + l) k Q / o\ 
o - * kl(p — k)l T(k + a + 1) ’ W 


*“(1 - X? E* 


(-)V 


ok\T(k + o + l 

v V ( 2 p + « + 0 + l)r(p + fe + «+ g+ 1 ) „ . . 

x + (P - *)! r(p + 0 + l) ’ fi)> 



130 


EDMUND PINNEY 


= x*(l - x)* E*ai n V, 
o 

by (5), so the f(x) of (12) may be expressed in the form (4)-(6), and part 1° of 
the theorem is established. 

If (7) holds, by (5), a { k n+1) = aj n) for k = 0,1, • • • , n, and a ftt 1 * = 0. There¬ 
fore, in ( 6 ), 

w+l 

f(x) = x"(l - xf 'E k ai n+1) x i , 

0 

and by part 1 °, for the case in which n is replaced by n + 1 , it follows that ( 2 ) 
holds for m = n + 1 , so part 2 ° is established. The establishment of part 3° 
is essentially the same. 

In applying this theorem to the problem of empirical curve fitting, it follows 
from ( 6 ) that the constants a and 0 should differ from zero only if the empirical 
curve approaches zero or infinity at one or both of its endpoints. With this 
in mind the following rules may be stated: 

Case A. If, in the empirical curve,/( 0) ^ 0 or , and/( 1) ^ Oor a, set 
a = = 0 , and let n be one less than the number of moments that it is desired 

to fit. 

Case B. If /(0) = 0 or °o and/(l) 5 ^ 0 or °o, set p = 0 and determine a from 
( 7 ), n being two less than the number of moments that it is desired to fit. 

Case C. If /(0) ^ Oor « and /(l) = 0 or co, set a = 0 and determine 0 
from (7), n being two less than the number of moments that it is desired to fit. 

Case D. If /(0) = 0 or °o and/( 1 ) = 0 or determine both a and 0 from 
the two equations (7) and ( 8 ), n being three less than the number of moments 
that it is desired to fit. 

It may happen that these processes cannot be carried out, or at least cannot be 
conveniently carried out. If this is the case, a or 0 may be set arbitrarily and n 
taken as one unit higher than before, or both a and /3 may be set, and n taken 
as two units higher than before. 

In Case D, above, the solution of equations (7) and ( 8 ) may often prove 
difficult, making it advisable to follow the suggestions of the last paragraph. 
In certain special cases, however, their solution is not difficult. 

Suppose, for example, the moments satisfied the equations 

(15) Hm = (“) 5 m«j w = 0, 1, ••• . 

If this is substituted into (4), and the order of summation reversed, on making 
use of the identity 

v ( n \ r (p + a ) r_\p _ ^_\n r(«)r(« — * + 1) 

V ' r P Wr(p + ^ ; V ; T(a-v-n+l)r(n + v)’ 


one obtains 
(17) 


Sp(«, P) = (-) p S p (0, a). 



SEQUENTIAL BINOMIAL ESTIMATES 


131 


Therefore 

(18) $2 P +i(a, oi ) = 0. 

When n is an integer, either n + 1 or n + 2 is odd. Therefore when (15) 
holds, one of either (7) or (8) will be satisfied identically if we take 0 = a. The 
other may then be solved for a. 

As an example, suppose one had the moments mo = 1, mi = £, M 2 = tt, m* = 

AU = and wished to obtain an f(x) such that /( 0) = 0, /(1) = 0. In this 
case n = 2, and (15) is satisfied. It follows that (7) is satisfied identically when 
& = a, and (8) gives 

r(2a + 5) , , T(2a + 6) /_l\ T(2a + 7) ( 7\ 
r(a + 1) T(a + 2) V 2) ^ T(a + 3) \24/ 

_i_ „ r(2« + 8) (_ *\ , r(2a + 9) / 31 \ __ A 

r(a + 4) \ 16/ ^ r(a + 5) \240/ ‘ 

This easily reduces to 

_ a + 5/2 (a + 5/2)(« + 3) 
a + 1 (a + l)(a + 2) 

_ (« + 5/2) (a + 7/2) 31 (« + 5/2) (« + 7/2) 

(a -f- l)(c* ~f~ 2) 240 (a -f* l)(a -}- 2) 5 

which reduces to the quadratic 

4a 2 — 6a + 5 = 0, 

from which 

(19) a = p = 3/4 ± (1/4)vTTz. 

These may be substituted into (4)-(6) to complete the solution, 

REFERENCE 

[1] G. Szego, Orthogonal Polynomials , Amer. Math, Soc. Colloquium Pub., No. 23, 1939. 

CONSISTENCY OF SEQUENTIAL BINOMIAL ESTIMATES 

By J. Wolfowitz 
Columbia University 

The notion of consistency of an estimate, introduced by R. A. Fisher, applies 
to a sequence of estimates which converge stochastically, with boundlessly 
increasing sample siee, to the parameter (or parameters) being estimated. Each 
estimate is a function of a sample of observations, the number in each sample 
being determined independently of the observations themselves. In sequential 
estimation, on the other hand, the number of observations is itself a chance 



132 


J. WOLFOWITZ 


variable, determined by the sequence of observations and the application to 
them of a rule which may be part of a sequential test. In what follows we 
shall consider that the operation of sequential estimation is associated with a 
sequential test. 1 

The advantage of using consistent estimates is such as to suggest extension 
of the idea of consistency to sequential estimation. In the present paper we 
shall be concerned only with the estimation of a binomial probability (p, say). 
The obvious extension is that a sequence of estimates, each with its associated 
test, is consistent if the estimates converge stochastically to p. 

Since the number of observations required by a sequential test is a chance 
variable, a parallel to the classical sequence of samples of increasing size would 
be a sequence of sequential tests whose average (in some sense) sample sizes 
increase without limit. It seems reasonable to associate only such a sequence 
of estimates with this sequence of tests as will converge stochastically to p, 
i.e., be consistent. 

Let z be a chance variable which takes the distinct values Ci and C 2 with proba¬ 
bilities p, 0 < p < 1, and q = 1 — p, respectively. Let z x , • • • , z n be a sequence 
of independent observations on z which terminates with the nth according to the 
specific sequential test under consideration. Denote by x and y, respectively, 
the number of observations C 2 and c x in this sequence. Then x> y and n = x + y 
are all chance variables. The couple g = (x, y) is called a boundary point of 
index n (see [1]). The sequence of observations which terminates at g is called a 
path. Let k{g) denote the number of paths which terminate at g, and let k*(g) 
denote the number of these paths whose first observation is c x . The “points” 
on the various paths together with all the points g constitute the “region” under 
discussion. 

Let P{n = j) denote the probability of the relation in braces. If 

±P{n=j\ = 1, 

J-l 

the region is called closed. Only closed regions will be considered below, so that 
this assumption will henceforth be made without explicit formulation. It has 
been shown by Girshick, Mosteller, and Savage [1], that p(g) = k*(g)/k(g) 
is an unbiased estimate of p for any closed region R , i.e., 

2 p(g)k(g)p v q x m p, 

where the summation takes place over all the boundary points g of R. For 
many important regions this estimate is the unique unbiased estimate. 

Let there be given an infinite sequence of sequential tests with each of which 
we associate the estimate p(g). Consider the ith one of these, and let no* be 
the smallest number of observations required for a decision, i.e., no< is the smallest 

1 Really all that is required is a rule for terminating the observations such that its region 
R is closed (see below). However, we defer to conventional statistical usage in referring 
to “tests.” 



SEQUENTIAL BINOMIAL ESTIMATES 


133 


value of j for which P{n = j] =t= 0. The theorem proved below asserts that if 
n 0 1 approaches infinity with i the estimate p(g) converges stochastically to p. 
To put it in other words: if Ti , T 2 , • • • is the sequence of tests, and and e 2 
are arbitrarily small positive numbers, there exists a positive number ,/(ci, e 2 ) 
such that, for all T x such that i > J, 

^(1 pig) - v I > «i! < ‘ 2 , 

when n 0t —> « . An important example of such a sequence is that of the Wald 
sequential binomial tests [2] obtained as follows: Let <x \, a 2 , • * • , a, • • • and 
ft , ft , * * * , ft • • • , be two sequences of positive numbers all of which are less 
than J and which approach zero as i —► . Let p 0 and pi, 0 < p 0 < pi < 1, 

be two fixed numbers, 

Cl = log ~, c 2 = log fj-- 

Po (1 — Po) A—1 

Finally let the rule for terminating the process of drawing observations be as 
follows for the ith test T % : The process of draw ing observations terminates at 
the smallest integer n for w hich either 

> log --— or Z„ < log — ■. 

a, l — a. 

Since (1 — (3 t )/a t —» qo and 0»/(l — a t ) —> 0 w hile c*i and c 2 are constant, it is 
evident that the hypothesis of the theorem is satisfied. 

The property of being unbiased is not generally considered an indispensable 
characteristic of an optimum estimate, while consistency is generally so regarded. 
Our theorem shows that p(g) enjoys the latter property with respect to important 
sequences of sequential tests. 

Theorem: Let r l\ , • • • , T t , • • • be a sequence of sequential binomial tests. 
For the i th test 1\ let be the smallest integer such that P{n = n 0t } =1= 0. Finally 
let n 0% —> oo as i —» ». Then p(g) converges stochastically to p as i —> x . 

Proof: For typographic simplicity w r e shall use tin as the designation of the 
generic element of the sequence n 0 i, ftrc, • • • . No confusion will be caused 
thereby. 

Let n' = n 0 — 1, and <5i > 0 and <5 2 > 0 be arbitrarily small fixed numbers. 
Let k'(g) be the number of paths w 7 hich end at the point g and are such that 
| y'/n' — p | < $i, where y f is the number of observations Ci among the first n' 
observations. We then have 

Lemma 1. For n 0 sufficiently large 

(1) Z k’(g)pY >1-5, 

g*B 

where B is the set of boundary points of R. 

Proof: Consider the totality \h} of all points h = p'), with x' + y' = n'. 

Here x' and y f denote, respectively, the number of observations C 2 and Ci in the 
sequence of the first n ' observations on z. Let denote the number of paths 



134 


J. W0LF0WITZ 


to h. Let C denote the set of points h such that | y'/n' — p | < $1 . If no is 
large enough we have, by the law of large numbers, 

£** (h)p'Y > 1 - fc. 

htC 

Let k{h, g) be the number of paths from h to g. From Theorem 2' of [3] it 
follows that 

(A) £ k(h, g)p*q* = V y 'q'‘. 

gtB 

Also from the definitions of the various symbols involved it readily follows that 

k'(g) = Z ko(h)k(h, g). 

htC 

Hence 

£ k'(g)pY = £(£ *»(«*(*, ?))pY = £ (£ **«*(*. ?)pY) 

q*B gtB htC gtB htC 

= Z *.(*)(£ *(*, p)pY) = Z h(h)p v 'q z ' >1-5,. 

fc«C gtB htC 

This proves Lemma 1. 

Let f(g) = [k(g) — k'(g)]k(g). Thus £(g) is a chance variable, being a function 
of the chance point g. 

Lemma 2. . Let 5 S and $ 4 be arbitrarily small 'positive numbers. For n 0 sufficiently 
large 

(2) P{{(0) ^ *>} > 1 - h. 

Proof: If (2) were not true, we would have 

(3) E k ^ = s < (1 - ««) + (1 - 5,)5, = 1 - 5,5,. 

Choose the 82 of Lemma 1 so that 82 < 6a5 4 . For some large value of n 0 we 
would then have a contradiction between (1) and (3). This proves the lemma. 

Let g be any boundary point. Consider any path whose y' is such that 
| y'/n' — p | < 81 ; let us call such a path one of type T. Consider the terminal 
sequence S of this path, 

S l ZriQ , 2»q+ 1 , * * * > Zn 

This sequence, together with g = (x, y ), uniquely determines y'. Any permuta¬ 
tion of y' elements Ci and n' — y' — x' elements c 2 may serve as the initial sequence 
of n' observations of a path which terminates at g and has the terminal sequence 
S . For no boundary point is of index smaller than no, so that under permuta¬ 
tion of the first n' observations a path remains a path, i.e., the process of taking 
observations will not terminate prematurely as a result of the permuting of the 
elements. Of these permutations a proportion y'/n' begin with the element C\ . 
We deal in this manner with all the different terminal sequences of the paths of 



SEQUENTIAL BINOMIAL ESTIMATES 


135 


type T which end at g. Let k*'{g) be the number of these which begin with c x . 
We obtain 

Lemma 3. For all g such that k'(g) =f= 0 


k* f (g) 

k\g) 



< 5 X . 


Putting Lemmas 2 and 3 together we have 

Lemma 4. As n Q —► », k*'{g)/k(g) converges stochastically to p. 

Now it follows in a manner similar to that of Lemma 2 that, as n 0 <», 
k*'(g)/k*(g) converges stochastically to one. This, together with Lemma 4, 
proves the theorem. 


REFERENCES 

[1] M. A. Gibshick, Frederick Mo&teller, and L. J. Savage, “Unbiased estimates for 

certain binomial sampling problems, with applications,” Annals of Math . Slat., 
Vol. 17 (1946), pp. 13-23. 

[2] A. Wald, “Sequential tests of statistical hypotheses,” Annals of Math. Stat., Vol. 16 

(1945), pp. 117-186. 

[3] J. Wolfowitz, “On sequential binomial estimation,” Annals of Math. Stat., Vol. 17 

(1946), pp. 489-492. 



BOOK REVIEWS 

Mathematical Methods of Statistics. Harold Cramer. Uppsala, Sweden: 

Almqvist and Wiksell, 1945. pp. xvi, 575. (Princeton, N. J.: Princeton 

University Press, 194G. $6.00) 

Reviewed by Will Feller 
Cornell University 

This book represents a contribution of a novel kind to the statistical literature 
and will render valuable services both as textbook and reference book. Of its 
three parts the first one (134 pages) is entitled Mathematical Introduction and 
develops the necessary formal mathematical tools. The second part (186 pages) 
is devoted to Random Variables and Probability Distributions , that is to say, to a 
chapter of the modern theory of probability. The third, and main, part of the 
book (some 233 pages) is entitled Statistical Inference. Ordinarily these three 
topics would require consultation of three or more books, and these would rarely 
be found on the same shelf. However, the masterly exposition succeeds in creat¬ 
ing the impression of natural unity and harmony. The ideas are developed with 
elegance and apparent ease as if the line of presentation followed a well explored 
path. The uninitiated will not notice how unconventional the treatment is and 
how the very selection of topics depends on the author's scientific personality. 

It is hardly necessary to point out that Cramer’s book fills an urgent need. 
The emergence of statistical theory and methodology as an exact science, firmly 
grounded in mathematical probability, is only of recent date. Its rapid develop¬ 
ment went hand in hand with an extraordinary increase of the number and im¬ 
portance of its various applications. Under such circumstances there was 
naturally little time for an exposition of the theoretical foundations and ramifi¬ 
cations. Modem statistical inference has its roots in the classical limit theo¬ 
rems of probability. Now classical probability used to consist of a bewildering 
collection of special and mutually uncorrelated problems; unified guiding princi¬ 
ples and methods are a rather new development and have not yet found expression 
in the textbook literature. The original investigations are usually written in an 
exceedingly abstract language and the existing close ties to applications are not 
apparent. Consequently, there is no easy access either to probability or statis¬ 
tics and it is often difficult to establish whether, or to what extent, various asser¬ 
tions have actually been proved. The present book therefore closes a serious 
gap in the literature and will greatly facilitate both teaching and research. 

Of the 12 chapters of the Mathematical Introduction 9 are devoted to the theory 
of measure and integration. The antiquated theory of the so-called Riemann 
integral (kept alive by elementary textbooks) considered only point functions 
V = /(z), where the independent variable is a point. The temperature at a 
given point or the velocity at a given moment are typical examples. Many 
mathematical considerations simplify greatly if from the very beginning also set 

136 



BOOK REVIEWS 


137 


functions y = F{A) are introduced, where the independent variable is a set. 
Typical examples are mass in mechanics, the amount of heat or of electricity, 
area or wealth of a geographic region, and the probability of events (i.e. sets in 
sample space). The Ixibcsgue-Stieltjes theory frees the concept of integral from 
artificial devices and reduces it to the natural notion of mean values with respect 
to set functions. In a simile, believed to be due to Lebesgue, the Riemann inte¬ 
gral corresponds to the procedure of a grocer who computes the day's receipts 
by actually adding the several amounts in the order as they had come in. The 
Lebesgue procedure imitates the more intelligent grocer who orders his cash in 
piles of notes and coins of equal denomination and counts them. The analogy 
with the customary procedure of computing mathematical expectation is clear. 
The Lebesgue-Stieltjes integral is conceptually simpler than the Riemann integral 
and can be presented in as simple a way with rigor adequate for elementary text¬ 
books. It has become an indispensable tool in probability, statistics, physics, 
and other applied fields. Since it has, unfortunately, not found its way into 
calculus textbooks, physicists are compelled to use the less flexible notion of the 
Dirac 5-function, and the formal mathematical apparatus in general becomes 
unnecessarily clumsy. It is a curious anomaly that so many calculus textbooks 
proless to be written w ith a view to applications and yet completely disregard 
the most obvious practical needs and that the teaching of practical mathematics 
should remain uninfluenced by the great developments of the last fifty years. 

Tn such circumstances the chapters on integration will be particularly welcome 
to statisticians as probably the only place in the literature where they will find 
easy access to the theory. Of course, this exposition leads far beyond what the 
average statistician will require under ordinary circumstances and beyond the 
necessary prerequisites of the main body of the book. Of the 88 pages roughly 
half can be omitted at first reading in accordance with detailed instructions given 
in the Preface. The remaining half will form a valuable reference book for 
theorems and tools used occasionally in connection with more delicate parts of 
statistical theory. The mathematical introduction contains also a chapter on 
Fourier integrals (characteristic functions), one on matrices and quadratic forms, 
and finally miscellaneous complements such as orthogonal polynomials, Euler's 
summation formula, beta and gamma functions, etc. 

The title to the second part, Random Variables and Probability Distributions , 
is the same as that of the author's w ell-known Cambridge Tract of 1937. Both 
start with a discussion of the foundations along axiomatic lines. The new treat¬ 
ment does not differ essentially from the old one, but some changes are intro¬ 
duced which are regrettable in the reviewer's opinion (in particular axiom 3). 
Otherwise there is practically no overlap between the two expositions. The 1937 
booklet devoted much space to the asymptotic expansions connected with the 
central limit theorem which are due to the author himself. This topic is not 
touched upon in the present book. This is a judicious procedure since the 1937- 
booklet is generally accessible (although at present sold out). Instead we now 
find a detailed study of some univariate distributions such as x 2 , Student's t, 



138 


BOOK REVIEWS 


Fisher's z, the Pearson system, etc., none of which were mentioned in the Cam¬ 
bridge tract. Similarly, there is now a section on correlation and regression, 
and the normal distributions in several variables. The theory of probability is 
developed only to the extent of the formal theory of distribution functions. This 
implies that even so important a notion as stochastic convergence is treated only 
summarily while the strong law of large numbers falls completely outside the 
framework of the book. This is regrettable inasmuch as the strong law is of 
greater importance than the classical weak law (whose fame rests essentially on 
a classical misunderstanding). It should be mentioned that this second part of 
the book contains some 39 well chosen illustrative exercises the solution of which 
is left to the reader. 

In the main part of the book, entitled Statistical Inference y the outer form 
changes inasmuch as the text there is accompanied by numerous practical exam¬ 
ples. However, the exposition remains mathematical in nature and the main 
emphasis rests on exact formulations; much attention is paid to the establishment 
of the precise conditions of validity of the individual theorems, their logical 
interrelations and their connections with general probability. The expert will 
find many minor and major improvements in formulations and proofs. They are 
too numerous to be listed here. Suffice it to point out, as a typical example, the 
theorem on pp. 426-27 concerning the limiting form of the x 2 distribution with 
estimated parameters; this theorem appears to be more general than usually 
stated and also the proof seems to be novel. The topics treated in the statistical 
part of the book will be seen from the following list of titles to the chapters. 25. 
Preliminary Notions on Sampling. 26. Statistical Inference (general orienta¬ 
tion). 27. Characteristics of Sampling Distributions (moments, semi-invariants, 
corrections for grouping, etc.). 28. Asymptotic Properties of Sampling Dis¬ 
tributions (moments, extreme values, range, etc.). 29. Exact Sampling Distri¬ 
butions (degrees of freedom, Student, Fisher, correlation and regression coeffi¬ 
cients, partial and multiple correlations, generalized variance, etc.). 30. Tests 
of Goodness of Fit and Allied Tests (treating mostly applications of x 2 )- 31. 
Tests of Significance for Parameters. 32. Classification of Estimates (sufficient, 
efficient and asymptotically efficient estimates; minimum variance, etc.). 33. 
Methods of Estimation (method of moments, maximum likelihood, x 2 -niinimum 
methods). 34. Confidence Regions. 35. General Theory of Testing Statistical 
Hypotheses. 36. Analysis of Variance. 37. Some Regression Problems. There 
follow tables of the normal distribution, the x 2 and the t-distributions, and a long 
fist of references. 

If an expression of wishes for a second edition were permitted, most statisti¬ 
cians would probably give first choice to non-parametric and sequential tests. 
It is needless to point out that the latter became public only after completion of 
the Swedish edition of the present book 

Even this short account will show the extremely wide range of topics and 
theories covered in the book, from abstract integration to randomized experi¬ 
ments. They are all presented with uniform lucidity. The exposition through- 



BOOK REVIEWS 


139 


out is formal, and yet inspiring, rigorous and yet never pedantic. It will serve 
as an example worthy of imitation and is an achievement on which the author 
deserves our sincere congratulations. 


The Advanced Theory of Statistics. Vols. I and II. Maurice G. Kendall. 

London: C. Griffin and Co., Ltd. Vol. I. Second ed. revised, 1945; pp. xii, 

457, 50 shillings. Vol. II. 1946; pp. viii, 521; 42 shillings. 

Reviewed by M. S. Bartlett 
Cambridge University and The University of North Carolina 

With the recent appearance of the second volume, it is now possible to review 
as one work this comprehensive treatise. To quote the author’s opening re¬ 
marks to the Preface to Volume I: “The need for a thorough exposition of the 
theory of statistics has been repeatedly emphasized in recent years. The object 
of this book is to develop a systematic treatment of that theory as it exists at the 
present time.” An outline of the contents, which in the two volumes make up 
just on a thousand pages, will indicate that this formidable task has been squarely 
faced by the author, who, when a tentative co-operative venture of writing such 
a treatise was upset by the outbreak of the war, continued alone with the project. 

Volume I contains sixteen chapters. The first six introduce the concept of 
frequency distributions via observational data on groups and aggregates, and 
their mathematical representation (Ch. 1), measures of location and dispersion 
(Ch. 2) and moments and cumulants in general (Ch. 3), characteristic functions 
(Ch. 4), and ending with a description of the standard distribution functions, such 
as the binomial, Poisson, hypergeometric and normal distributions, and the 
Pearson and Gram-Charlier systems. The next section opens with probability 
(Ch. 7) and proceeds to sampling theory (Chs. 8-11), including a chapter (Ch. 10) 
on exact sampling distributions, many of the standard sampling distributions 
being used in this chapter to illustrate the mathematical methods available for 
obtaining sampling distributions. Chapter 11 deals with the general sampling 
theory of cumulants, including a useful reference list of formulae and a demon¬ 
stration, due to the author, of the validity of Fisher’s combinatorial rules for 
obtaining these formulas. The section concludes with a chapter on the Chi- 
square distribution and some of its applications. The last four chapters of 
Volume I deal with association and contingency, correlation, including partial 
and multiple correlation, and rank correlation; this last chapter being a compre¬ 
hensive treatment including comparatively recent results of the author. 

It will be convenient to list also the contents of Volume II before any critical 
comment on either volume. The first section of the second volume comprises 
four chapters on the theory of estimation, including a derivation of the properties 
of the maximum likelihood estimate (Ch. 17) and separate chapters on Fisher’s 
theory of fiducial probability and Neyman’s theory of confidence intervals. The 



140 


BOOK REVIEWS 


second main section, according to the author’s remarks in the preface to Volume 
II, deals with the theory of statistical tests and comprises chapters 21, 23, 24, 20, 
27 and 28; of these after an introductory chapter (Ch. 21) on tests of significance, 
chapters 23 and 24 cover analysis of variance, chapters 26 and 27 give a fairly 
detailed account of the general theory of significance-tests originated by Neyman 
and Pearson and Chapter 28 deals with the recently developed techniques of 
multivariate analysis. The remaining chapters are 22 on regression, 25 on the 
design of sampling enquiries, and Chapters 29 and 30 on time-series, another 
subject in which the author has himself taken an active interest. Finally, there 
are two appendices, A consisting of a few addenda to Volume I, and B an exten¬ 
sive bibliography of theoretical statistical papers. 

The volumes are attractively printed; and each chapter concludes with a useful 
collection of examples for the reader. 

In any comprehensive treatment of a wide subject there can be no clearly de¬ 
fined order of presentation; nevertheless, the author’s order of chapters in Volume 
II and in particular his inclusion of analysis of variance among the chapters on the 
theory of statistical tests is a little puzzling, and the reviewer’s preference would 
have been to see this important subject treated earlier, together with regression 
analysis, and their link with the classical method of least squares more firmly 
outlined. Incidentally, there appears to be no mention of the Fourier analysis 
of observational data except in its relation to periodogram analysis (Ch. 30). 
This change of order w ould perhaps also have allow ed a shift forward of Chapter 
25 on the design of sampling enquiries, and a more compact section on multiple 
correlation, culminating with the chapter on multivariate analysis before the 
chapters on the general theory of statistical inference were begun. 

Another arrangement of rather doubtful value in Volume II is the allocation of 
separate chapters to fiducial probability and to the theory of confidence inter¬ 
vals. The problem of how to deal with a field which is still a battleground is 
admittedly not an easy one, and this particular one is an embarrassment at 
present to many teachers, but it may be questioned whether strict impartiality 
is the best answer. To take a hypothetical example, there would seem to be no 
particular virtue in a textbook which expounded, in parallel, statistical methods 
of inference using direct probabilities and the method of “inverse probability”, 
leaving the reader to decide at the end which he should adopt. 

The most criticizable arrangement, however, occurs in Volume I with the late 
and rather scanty treatment of probability in Chapter 7. To begin with ex¬ 
amples of statistical data is sound, but since the w hole conceptual model erected 
to deal with such data is based on probability theory, it does not seem sufficient 
for a reader who “feels keenly on the subject” to do as the author suggests in the 
Preface and read Chapters 7 and 8 after Chapter 1. Even if he does so, he will 
find no very clear exposition of the statistical theory of probability,—no mention, 
for example, of the law s of large numbers, whether for simple dichotomies or for 
entire continuous distribution functions, that show how the conceptual model 
adequately corresponds with the empirical notions of “in the long run” or “for 



BOOK REVIEWS 


141 


a large enough sample”. The actual arrangement, moreover, leads to an ap¬ 
parently rather arbitrary 4 treatment of theorems on limiting distributions; 
the First Limit Theorem, which deals with the equivalence of the limits of dis¬ 
tribution function and corresponding characteristic function sequences, is given 
in the chapter on characteristic functions (Ch. 4), and the Central Limit Theo¬ 
rem, dealing with the convergence to normality of a sum of n independent random 
variables, is given in the chapter on probability. 

In the proof of the second part of the First Limit Theorem, dealing with the 
conditions under which a sequence <j> n (t) of characteristic functions determine 
the limiting distribution function F (x), the author has not yet corrected an error 
that occurred in Cram6r’s original version, which Kendall follows (section 4.12). 
Correct conditions for convergence of the distribution function sequence F n (x) to 
F(x) (at all continuity points of F) are convergence of the characteristic function 
sequence to 0(0 for all real t, uniformly in at least some finite t interval (cf. H. 
Scheflfe, Math. Reviews , Vol. 6 (1945), p. 89). 

Another proof in Volume I which appears to need clarification is the geometri¬ 
cal derivation of the distribution of the multiple correlation coefficient in the case 
of a non-zero true correlation (section 15.21). The blunt statement is made, 
following equation (15.51), that the sample correlation coefficient R and an angle 
0 (defined in the text) are independent, a statement which is incorrect. How¬ 
ever, if the logic of Fisher’s original derivation is examined, it turns out that the 
relation of R and 0 is only required when the true correlation is zero; under such 
conditions R and 0 are independent. 

In Volume II there is a sentence requiring correction and amplification in the 
derivation (in the case of zero true canonical correlations) of the sampling canoni¬ 
cal correlation distribution (section 28.30). The sentence “Consider the dis¬ 
tribution for a given value of l tJ - and z i7 • • • ” should be corrected to read “Con¬ 
sider the distribution for a given value of Uj + Za • • • Some justification that 
the distribution is independent of Uj + za is then still needed. 

There is inevitably, owing to the time the book was written, no mention of 
sequential analysis, the sampling technique developed during the war by Wald 
and others and only recently “derestricted”. Again, in chapter 18, where the 
work of Aitken and Silverstone on unbiased estimates with minimum variance is 
referred to, the simple inequality connecting the variance of any unbiased esti¬ 
mate with Fisher’s information function throws an interesting new light on this 
aspect of the estimation problem (see, for example, H. Cramer, Mathematical 
Methods of Statistics , section 32.3, or C. R. Rao, Bulletin Calcutta Math. Soc., 
Vol. 37 (1945), p. 81), but was not known to the author when this chapter was 
written. Such omissions are merely an indication of the developing nature of the 
subject, and it is hoped they can be remedied in later editions. There is, how¬ 
ever, especially in Volume II, an occasional impression of patchiness in the treat¬ 
ment not altogether excusable on such grounds. This can perhaps be illustrated 
from the last chapter, a valuable contribution to the still-growing subject of 
time-series, but where the importance of some known results does not always 





142 


BOOK REVIEWS 


seem sufficiently stressed; in particular, the Wiener-Khintchine relation between 
the periodogram and correlogram is noted (section 30.68) as “an interesting re¬ 
lation”, whereas it is a fundamental relation in the modem method of approach 
to time-series, giving much deeper insight into the correct interpretation of 
classical periodogram analysis. 

These criticisms, which could be extended to cover minor errors and mis¬ 
prints, are not intended to detract seriously from what is a remarkable achieve¬ 
ment. An excellent sense of proportion has been maintained throughout be¬ 
tween mathematical theory and illustrative discussion and examples. This makes 
this treatise, if both the breadth and level of the subject matter are taken into 
account, at present unique. It will be an indispensable reference book to every 
teacher and advanced student of the theory of statistics. 


Sequential Analysis of Statistical Data: Applications. Prepared by the 
Statistical Research Group, Columbia University for the Applied Mathe¬ 
matics Panel, National Defense Research Committee, Office of Scientific Re¬ 
search and Development. SRG Report 255, Revised; AMP Report 30.2R, 
Revised. New York: Columbia University Press, September 1945. pp. vii, 
17; iv, 80; v, 57; iii, 25; iii, 18; iii, 39; ii, 41. $6.25. (London: Oxford Uni¬ 
versity Press, 1946.) 


Reviewed by John W. Tukey 
Princeton University 

Many of the features of this compendium are familiar to most of the readers of 
this review, but for the benefit of the others I shall enumerate them briefly. It 
consists of a heavy looseleaf binder containing 7 booklets of distinctive colors— 
each saddle stitched and usable separately. It is the last word (to date) in pre¬ 
senting sequential analysis to the statistician who may wish to use it in practice. 
It covers five elementary cases (each in a booklet, the two others being used for 
introduction and appendices): 

Acceptance or rejection by percent defective (Sec. 2) 

Comparative percent satisfactory (Double dichotomy) (Sec. 3) 

Acceptance or rejection by the adequacy of the mean (with known variability) 
(Sec. 4) 

Acceptance or rejection by the exact value of the mean (with known variability) 
(Sec. 5) 

Acceptance or rejection by the smallness of the variability (Sec. 6) 

These cases are covered in complete detail, with illustrative examples, tables and 
charts. A copy should be accessible to every teacher of statistics and to every 
statistician in industry or experimental work who can propose new techniques of 
testing. 



BOOK REVIEWS 


143 


With this general introduction let us go on and explain what the reader will 
not find and what further work in this line the reviewer awaits with keen interest. 
The classical testing procedure was to test a sample of predetermined size and 
then decide to accept or reject. Long ago curtailed sampling and double samp¬ 
ling were developed to cut comers legitimately and reduce inspection costs. 
There are two situations, each more frequent in war than in peace, where it is 
clearly desirable to reduce the average number of items tested to a minimum: 

(I) Where essentially all lots are accepted and the test is destructive so that the 
items tested are the main loss of production, or 
(II) Where the cost of testing an item is large in comparison with the cost of 
production. 

Subject to a practically unimportant allowance for the finite size of the lots, and 
to an allowance of unknown importance for the quality of lots presented, the 
methods of sequential analysis minimize this average number among all methods 
so far considered. When situation (I) or (II) holds without modifiying complica¬ 
tions, then, the best known method is sequential analysis, the natural descendant 
of double sampling. Otherwise, the situation is far from clear, and much judge¬ 
ment is involved in setting up a practically efficient scheme. The reader will 
get no help on this problem of judgement, nor in the problem of setting risks from 
the book under review—he will get every needed help with the mathematical 
problem of setting up a sequential plan to meet chosen risks, including complete 
tables of all necessary functions, including natural logarithms. 

There is no reason to suppose that sequential analysis is the last word in testing 
procedures for the general problem of efficient testing, but what should be the 
next step ahead is not a step for the mathematical statistician. What is needed 
now is a careful analysis, by the operational research techniques so useful during 
the war, of a half-dozen industrial testing situations to determine what properties 
of the testing procedure are involved in cost and to what extent. Do we want 
the minimum average sample size, the minimum average square of the sample 
size—or what? With this there should go a corresponding operational study of 
the advantages of different OC curves, including those of what now seems to be 
a peculiar shape. Given these studies, we could put the problem in mathematical 
statistics to the mathematical statistician which he would then solve. But with 
the present lack of operational research groups in industry, it is probable that 
we will proceed in an unnatural way, and that the mathematical statistician will 
take the next step forward. For reasons of mathematical simplicity it is not 
unlikely that the sample plan with the minimum average squared sample size 
will come next. 

The credit for the book is clearly assigned on the inside cover of each pamphlet 
in the following words: “So many members of the Statistical Research Group 
(Columbia) have participated in the preparation of this report, a previous edi¬ 
tion of which was prepared by H. A. Freeman, that its authorship is attributed 
to the group as a whole. The responsibility for planning and preparing this 



144 


BOOK REVIEWS 


edition has been shared by H. A. Freeman, M. A. Girshick, and W. Allen Wallis, 
with the cooperation of Kenneth J. Arnold, Milton Friedman, Edward Paulson, 
and others. The theory of sequential analysis is mainly the work of A. Wald.” 

It may be of interest to notice a few minor points for the record. On page 
1.01 it is indicated that 100% inspection is 100% effective— this seems far from 
industrial experience. Another badly needed set of operational studies would be 
on the influence of the sampling plan on inspector’s inspection. On page 2.27, 
the footnote suggests that when a tabular procedure is used instead of 
a graphic one, that more decimal places should be kept—the logic of this is not 
clear. On page 4.14 it is stated that “similarly, if all patches had tested 400 
minutes, the experiment would have terminated at 9.4 ..Clearly no such 
experiment can terminate after a fractional number of tests. On page A.09 
it is stated that “Finally it should be mentioned that truncation of any kind 
ought generally to be avoided”. This seems to the reviewer to be a rash state¬ 
ment, for when not only average sample size but all other properties entering into 
the practical efficiency of a sampling plan are considered, this decision will almost 
certainly be reversed. The relatively small number of these detailed points is 
an evidence of careful and competent workmanship. 

A footnote to the Appendix (B) on some principles of sequential analysis states: 
“Any mathematician who may stray into this Appendix should be assured that 
the validity of the conclusions in no case depends upon the type of reasoning 
presented here; indeed, even for intuitive or heuristic arguments mathematicians 
may prefer those given in SRG 75”. This warning and caveat seems unduly 
strong—the appendix is recommended to all mathematically minded newcomers 
to sequential analysis. 

The same appendix warns the reader in a few places that the theory set forth 
does not allow for the fact that samples come in units. If the reader tries to 
apply the theory to cases far from normal inspection practice, for example with 
risks of 0.25 and average sample sizes of 12, he will then find out that this does 
occasionally make a difference. In conventional circumstances the approxima¬ 
tion will not bother him. 



NEWS AND NOTICES 

Reader s are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Mr. Kurt W. Back has accepted a position with the Research Center for Group 
Dynamics, Massachusetts Institute of Technology. 

Mr. Stanley D. Canter was discharged from the Army in October and has been 
enrolled as a graduate student in mathematical statistics at Columbia University. 

Mr. William W. Cooper has accepted a position at Carnegie Institute of Tech¬ 
nology, Pittsburgh. 

Mr. Robert Dorfman is enrolled as a graduate student in the Department of 
Economics, University of California, Berkeley, and is also serving as a teaching 
assistant in that department. 

Dr. Nicholas Fattu, formerly at Michigan State College, has accepted a teach¬ 
ing position at Indiana University, Bloomington. 

Mr. John P. Gill is now Chief of the Research and Progress Analysis Division, 
War Assets Administration, Houston Regional Office, Texas. 

Dr. Clausin D. Hadley has accepted a position with the Graduate School of 
Business, Stanford University. 

Mr. Malcolm H. Henry is now Assistant Statistician in the Statistical Depart¬ 
ment of the Michigan State Department of Social Welfare, Lansing. 

Dr. Alston S. Householder has accepted a position as Principal Physicist with 
the Monsanto Chemical Company, Clinton Laboratories, Oak Ridge, Tennessee. 

Mr. Morton Kramer is now with the Office of International Health Relations, 
U. S. Public Health Service, Washington. 

Mr. F. C. Leone, who was discharged from the military service in the fall, has 
returned to his former posisiton in the Department of Mathematics at Purdue 
University, Lafayette, Indiana. 

Mr. Philip J. McCarthy, formerly at Princeton University, is now at Cornell 
University, Ithaca, New York. 

Mr. Edward C. Molina has been named special lecturer in Mathematics at 
Newark College of Engineering, in addition to Dr. Emil J. Gumbel, previously 
mentioned. 

Mr. Nicholas Pastore has accepted a position in the Department of Mathe¬ 
matics, City College of New York. 

Dr. William S. Robinson is now Assistant Professor of Sociology and Statistics, 
University of California at Los Angeles. 

Dr. Leonard J. Savage, who has a Special Rockefeller Fellowship, is spending 
the academic year at the Institute of Radiobiology and Biophysics, University of 
Chicago. 


145 



146 


NEWS AND NOTICES 


Professor Dunham Jackson died at Minneapolis on November 6, 1946. From 
1919 until 1946 Mr. Jackson was Professor of Mathematics at the University of 
Minnesota, and in 1946 was named Professor Emeritus. 

Professor Charles C. Wagner died suddenly on May 23, 1946, at the age of 52. 
He was acting dean of the College of Liberal Arts of Pennsylvania State College 
when he died. 


Those interested in the work of the Mathematical Tables Project, will, upon 
request, be placed on the mailing list for copies of the monthly progress reports, 
issued by the Project. Requests should be addressed to Dr. Arnold N. Lowan, 
150 Nassau St., New York, N. Y. 


Statistical Research Laboratory, University of Michigan 

Several developments in instruction and research in the general field of sta¬ 
tistics are in progress at the University of Michigan. 

At the beginning of the current academic year the new Statistical Research 
Laboratory was opened. It is planned that this unit, which is a division of the 
Graduate School, will serve as the center for research employing statistical me¬ 
thods and for research in statistical methodology. Free consultation and advice 
on statistical matters are offered to all members of the University engaged in 
research and the latest types of computing machines are available for their use at 
no cost to them. Or the Laboratory will undertake, at fees to cover costs, com¬ 
puting and the analysis of data for such individuals or units of the University. 
The Laboratory will have available the services of the University’s completely 
equipped Sorting and Tabulating Station and expects to continue to provide a 
center for the most efficient computing service as improved machines are de¬ 
veloped. The technical assistants employed by the Laboratory will be advanced 
students of statistics who will thus have the opportunity to supplement their 
training with experience with actual statistical investigations. Professor 
C. C. Craig as Director and Professor P* S. Dwyer are in charge of the new labora¬ 
tory, each on a half-time basis. 

The new Laboratory is a research and not a teaching unit and is distinct from 
the large statistical laboratories for the use of students in statistics courses already 
in existence on the campus. With respect to instruction in theoretical statistics, 
the curriculum in that subject in the Mathematics Department has recently been 
revised and extended to include twenty-four semester hours at the undergraduate 
and graduate levels in addition to courses in probability, finite differences, 
graphical methods, and quality control. The somewhat related professional 
program in actuarial mathematics has likewise been strengthened. The teaching 
staff for these two curricula includes Professors H. C. Carver, A. H. Copeland, 
C. C. Craig, P. S. Dwyer, C. H. Fischer, and C. J. Nesbitt. 

A number of postwar research programs whose pursuit involves the use of 
probability and statistical methods have been established at the University of 
Michigan. Of especial interest is the new Survey Research Center under the 



NEWS AND NOTICES 


147 


leadership of Professors R. Likert and A. A. Campbell who will continue activities 
begun by their group in Washington in the Department of Agriculture. Re¬ 
search by survey methods in the social sciences for public and private agencies 
and in survey methods themselves will be pursued and in addition a training 
program combining formal courses and apprenticeship in the Center is being 
set up. 


New Members 

The following persons have been elected to membership in the Institute: 

Albert, George E., Ph.D. (Wisconsin) Head, Mathematics Division, Research Dept., Naval 
Ordnance Plant, Indianapolis, Ind., 1104 N. Oakland Ave. 

Ament, Richard P., B.A. (Cornell) Scientific Aid, 2129 20lh St., N., Arlington, Va. 

Bennett, Myra S., (Mrs. C. A.). A.B. (Michigan) Office Mgr., Institute of Math. Stat., 
Rackham Bldg., Ann Arbor, Mich., P. 0. Box #8, Saline. 

Blankmeyer, Edith., A.B. (Western College) Stat., Res. Dept., National Broadcasting Co., 
30 Rockefeller Plaza, New York 20, N. Y. 

Blyth, Colin, Jr., M.A. (Queen’s Univ. and Univ. of Toronto) Graduate student, Univ. of 
N. Car., Chapel Hill, N. Car., 209 Mangum Dormitory. 

Brown, Philip, B.S. (Pittsburgh) Stat., R. 329 Standard Oil Bldg., 3rd and Constitution 
Aves., Washington, D. C. 

Bruno, O. P., B.M.E. (New York Univ.) Chief, Methods Section, Ballistic Research Labs., 
Aberdeen Proving Ground, Md. 

Carrier, Norman H., M.A. (Cantab) Civil Servant, Mathematical Statistics Section of 
Chief Scientific Advisers Division, Ministry of Works, c/o Westminster Bank , Palmers ’ 
Green , A. 18, London, England. 

Chand, Uttam, M.A. (Punjab Univ., India) Graduate student, Univ. of N. Car., Chapel 
Hill, N. Car., 112 Mangum Dormitory. 

Crow, Edwin L., Ph.D. (Wisconsin) Mathematician, Science Dept., Res., Devel., and Test 
Organization, USNOTS, Inyokern, Calif. 

Dang, Mary., M.A. (California) Graduate student, Columbia University, New York 27, 
N. Y., Box 267, Johnson Hall. 

Ens, Catherine C., B.S. (Dayton) Stat. Res. Ass’t, Graduate School, Ohio State Univer¬ 
sity, Columbus, Ohio, 267 Fifteenth Ave.* Columbus 10. 

Fox, William H., Ph.D. (Indiana) Ass’t Prof, of Educ. and Ass’t Director of Res. and 
Field Service, Indiana Univ., Bloomington, Ind., 729 E. Hunter. 

GeUler, Murray A., M.A. (Columbia) Operations Analyst, Headquarters Army Air 
Forces, 222 N. Piedmont St., Arlington, Va. 

Gershenson, Charles P., B.B.A. (C.C.N.Y.) Res. Assoc., Institute of Psychological Res., 
Box 180, Teachers College , New York 27, N. Y. 

Gilford, Leon, A.B. (Brooklyn) Econ. Analyst, Census Bureau, Washington, D. C., 1410 
19th St., S. E. 

Goudamit, S., A., Ph.D. (Leyden) Prof, of Physics, Northwestern Univ., Evanston, Ill. 

Halperin, Max, M.S. (Iowa) Graduate student, Univ. of N. Car., Chapel Hill, N. Car., 211 
No. Columbia. 

Halperin, Sidney L., Ph.D. (Ohio State) Psychologist, Neuropsychiatric Institute, Univ. of 
Mich. Hospital, Ann Arbor, Mich., 2401 Pittsfield Blvd ., Pittsfield Village. 

Herbach, Leon H., A.B. (Brooklyn) Sub. Instr., Dept, of Math., Brooklyn Coll., N. Y., 
1926 64th St., Brooklyn 4 • 

Hoeff ding, Wassily, Ph.D. (Berlin) 151 West 88 St., New York 24, N. Y. 

Huhndorff, Roland F., B.S. (St. Mary’s Univ.) Ass’t to Ass’t Chief Chemist, The Texas 
Co., Res. Lab., Port Arthur, Texas. 



148 


NEWS AND NOTICES 


James, William C., A.B. (Knox Coll.) Director, Stat. Div., National Safety Council, 20 
N. Wacker Dr., Chicago 6, Ill., 7885 So. Dobson Ave., Chicago 19. 

Lev, Joseph, Ph.D. (Cornell) Ass’t Civil Service Examiner, N. Y. C. Civil Service 
Comm., and Lecturer, Teachers College, Columbia Univ., N. Y., 8550 Forest Parkway, 
Woodhaven 21 . 

Linder, Arthur, Ph.D. (Bern) Prof, of applied math, stat., University of Geneva, Switzer¬ 
land, Avenue de Champel 24. 

Lord, Frederic M., M.A. (Minnesota) Ass’t Director, Graduate Record Examination, 437 
West 59th St., New York 19, N. Y., 158 W. 68rd St. 

Marshall, Herbert, B.A. (Toronto) Dominion Stat., Dominion Bureau of Statistics,, 
Ottawa, Canada. 

Meacham, Alan D., Supv., Sorting and Tabulating Station, and Lecturer, School of Bus. 
Adm., Univ. of Mich., Ann Arbor, Mich., 114 Rackham Bldg. 

Miller, Irving, B.S. (C.C.N.Y.) Stat., Bureau of Labor Stat., Washington, D. C., 1900 
Biltmore St., N. W., Washington 9. 

Nanda, D., N., M.A. (Agra, India) Graduate student, Univ. of N. Car., Chapel Hill, N. 
Car., Dept, of Statistics. 

Pines, Sylvia F., M.A. (Michigan) Instr., Math, and Stat., 43-17 48th St., Long Island 
City 4 , N.Y. 

Quastler, Henry M.D. (Vienna) Medical Radiologist, Carle Hospital Clinic, Urbana, Ill., 
612 W. Nevada. 

Reiersol, Olav, Ph.D. (Stockholm) Teacher of stat., Univ. of Oslo, Oslo, Norway, Interna¬ 
tional House, 500 Riverside Dr., New York 27, N. Y. 

Romanovsky, Vsevolod I., Ph.D. (Moscow) Prof, at the Univ. and Member of the Academy 
of Sciences, Tashkend, U. S. S. R. 

Rust, Charles H., S.J., M.A. (St. Louis) Graduate student, St. Louis Univ., St. Louis, 
Mo., 221 N. Grand Blvd., St. Louis 3. 

Seal, Hilary L., B.Sc. (Univ. Coll., London) Head of Stat. Branch, Room 2, Old Bldg., G., 
Admiralty, Whitehall, London, S. W. 1, England. 

Serbein, Oscar N., Jr., M.S. (Iowa) Graduate student, Columbia Univ., New York 27, 
N. Y., Army Hall, Rm. 838H, 1560 Amsterdam Ave., New York 81. 

Sholl, D., A., B.Sc. (London) Stat. in Math. Stat. Section of Chief Scientific Adviser’s 
Div., Ministry of Works, 81 Lynmouth Ave., Bush Hill Park, Enfield, Middlesex, 
England. 

Siegel, Irving H., M.A. (New York) Chief, Economics Div., Veterans Adm., Washington, 
D. C., 5407 9th St., N. W., Washington 11. 

Sitgreaves, Rosedith, M.A. (Geo. Washington) Ass’t Stat., U. S. Public Health Service 
(on leave); Graduate student, Columbia Univ., New York 27, N. Y., Johnson Hall, 
411 W. 116th St. 

Tama, Joseph, B.A. (Washington) Pfc. U. S. Army, 5250 TIC; GHQ AFPAC; APO 500, 
c/o Postmaster, San Francisco, Calif. 

Tate, Merle W., Ed.M. (Harvard), M.A. (Montana) Assoc. Prof, of Educ., Hamilton 
Coll., Clinton, N. Y. 

Thrall, Robert M., Ph.D. (Illinois) Ass’t Prof, of Math., Univ. of Mich., Ann Arbor, 
Mich., 953 Spring St. 

Vaughn, Kenneth W., Ph.D. (Iowa) Director, Graduate Record Examination Office of the 
Carnegie Foundation for the Advancement of Teaching; and, Assoc. Director of Co¬ 
operative Test Service of Amer. Council on Educ., 437 West 59th St., New York 19, N. Y. 

Wallace, Clifford A., Sup’t of Quality, Camera Works, Eastman Kodak Co., 333 State St., 
Rochester, N. Y. 

Wilkins, J., Ernest, Jr., Ph.D. (Chicago) Mathematician, American Optical Co., S. I. D., 
Box A, Buffalo 15, N. Y. 

Wilkinson, Roger I., B.S.E.E. (Iowa State) Member Technical Staff, Bell Telephone Labs., 
463 West St., New York, N. Y. 



REPORT ON THE BOSTON MEETING OF THE INSTITUTE 


The twenty-fourth meeting of the Institute of Mathematical Statistics was held 
at the Hotel Statler, Boston, Massachusetts, on Saturday, December 28, 1946. 
The meeting was held in conjunction with the One Hundred Thirteenth Annual 
Meeting of the American Association for the Advancement of Science. The 
following 45 members of the Institute attended the meeting: 

K. J. Arnold, M. S. Bartlett, W. D. Baten, C. I. Bliss, G. W. Brier, G. W. Brown, T. H. 
Brown, B. H. Camp, C. W. Churchman, W. G. Cochran, J. H. Curtiss, D. B. DeLury, P. V. 
Dorweiler, Churchill Eisenhart, Benjamin Epstein, II. A. Freeman, Hilda Geiringer, II. II. 
Germond, J. A. Greenwood, Boyd Harshbarger, W. A. Hendricks, E. H. C. Hildebrandt, 
W. C. Jacob, H. B. Kaitz, L. F. Knudsen, Walter Leighton, A. J. Lotka, J. W. Mauchly, 
Margaret Merrell, E. B. Mode, Frederick Mosteller, C. M. Mottley, Doris Newman, R. H. 
Noel, H. W. Norton, Otis Pope, C. J. Rees, C. F. Roos, P. J. Rulon, J. W. Tukey, W. M. 
IJpholt, F. M. Wadley, C. L. Weaver, C. P. Winsor, W. J. Youden. 

At the morning session, a joint session with the Biometrics Section of the 
American Statistical Association, the following program was presented with 
Professor E. B. Wilson of Harvard University as chairman: 

Topic: The Analysis of Variance in Biology 

Papers: The Assumptions Underlying the Analysis of Variance 

Professor Churchill Eisenhart, University of Wisconsin and The National 
Bureau of Standards 

Some Consequences when the Assumptions are not Satisfied 
Professor W. G. Cochran, North Carolina State College 
The Use of Transformations 

Professor M. S. Bartlett, Cambridge University and the University of 
North Carolina 

Discussion: Professor Boyd Harshbarger, Virginia Polytechnic Institute 
Dr. W. C. Jacob, Long Island Vegetable Research Farm 
Professor C. P. Winsor, Johns Hopkins University 
Dr. W. J. Youden, Boyce Thompson Institute 

The program for the afternoon session, also a joint session with the Biometrics 
Section, under the chairmanship of Dr. E. J. DeBeer, Wellcome Research 
Laboratories, was as follows: 

Topic: The Analysis of Variance in Biology ( continued) 

Papers: The Analysis of Covariance 

Professor D. B. DeLury, Virginia Polytechnic Institute 
Discriminant Functions 

Professor George W. Brown, Iowa State College 

Discussion: Professor W. D. Baten, Michigan State College 
Professor C. I. Bliss, Yale University 
Mr. W. A. Hendricks, U. S. Department of Agriculture 


149 


P. S. Dwyer, 
Secretary. 



ANNUAL REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1946 

New Opportunities 

The return to peacetime conditions presents the Institute with new oppor¬ 
tunities for expanding its activities and usefulness. An increased appreciation 
for mathematical statistics has followed the many contributions made by our 
members to the war effort. The numerous societies interested in specific appli¬ 
cations of statistics have come to look to the Institute both for leadership in 
theory and for playing its part in the dissemination of new results. As a result of 
the drastic interruption in the normal training of students during the war, there 
is unusually keen competition for the services of capable statisticians. Those of 
our members who are engaged in teaching are responsible for the execution of a 
vigorous training program to meet current and future demands promptly and 
without sacrifice of quality. In short, we are in a position, as never before, to 
advance the development and efficient use of mathematical statistics. The fol¬ 
lowing account of some of our activities during the year will indicate, I believe, 
that the record is creditable. Yet in many instances what has been accomplished 
is only a beginning. 


Meetings 

The Development Committee has repeatedly stressed the desirability of an 
extension in our customary schedule of meetings in order to provide additional 
contacts between mathematical statisticians and the users of statistics. Owing 
to the greater availability of railway and hotel accommodation in 1946, we ob¬ 
tained our first opportunity to put this extension into effect. The regular winter 
meeting with the American Statistical Association and other social science or¬ 
ganizations was resumed at Cleveland in January, while the late summer meeting 
with the mathematicians took place at Cornell in September. In addition, two 
meetings were held with different sections of the American Association for the 
Advancement of Science, at St. Louis in March and at Boston in December. 
On both occasions the programs were expository and attracted large audiences. 
Finally, at the invitation of Princeton University, a one-day meeting at Princeton 
in November was devoted to the analysis of variance. While no joint sessions 
were conducted with engineering or industrial societies, several of our members 
took prominent parts in the programs of such societies. 

For the near future, it seems desirable to continue the practice of meeting in 
the winter with the ASA and social science groups and in the summer with the 
mathematical groups. In 1947 these meetings will be at Atlantic City, January 
24-27 and at Yale, September 1-5 respectively. It is not known whether con¬ 
ditions in future years will produce a return to Christmas rather than January 
meetings: for the present the hotel situation swings the balance in favor of 
January. 


150 



REPORT OF THE PRESIDENT 


151 


In 1946 the membership of the program committee was enlarged so that it 
would be better equipped to arrange joint meetings with other societies. We 
owe our thanks to the members for their successful efforts in the face of difficulties 
which still attend the planning of a meeting. 

Annals 

Despite the scarcity of manuscripts in the later stages of the war, our editor, 
Professor S. S. Wilks, succeeded throughout in maintaining the annual volumes 
of the Annals at their usual size. During 1946, scarcity gave way to plenty. 
The number of papers of good quality submitted in recent months is sufficiently 
great that there will be more than enough, by current estimates, to fill the 1947 
volume. To narrow the scope of the Annals or to reject good papers would be 
undesirable. Accordingly, thfc Directors have authorized an increase of 100 
pages in the 1947 volume if this is necessary to insure the publication of all ac¬ 
ceptable papers. 

A gratifying testimony to the prominence of the Annals in its field is the marked 
increase in the demand for back numbers. Our Secretary-Treasurer reports 
that sales amounted to $3,235. To meet actual or anticipated orders, eleven 
issues were reprinted during 1946 at a cost of $2,809. 

For most members of the Institute, even those who serve on the Board, work 
on Institute affairs occupies only a minor portion of our time. The editor is 
never free from some forthcoming publication deadline. Initial perusal of manu¬ 
scripts, selection of referees, editorial decisions, handling of the production phases 
of publication and much miscellaneous correspondence (not all of it pleasant) 
make editorial work a daily preoccupation, year in and year out. An annual 
word of thanks is an inadequate expression of our indebtedness to Professor 
Wilks. 


Membership and Finance 

At the beginning of 1945 there were 606 members. A year later this figure 
had increased to 777 and at the end of 1946 the figure stood at 900. A fifty per¬ 
cent increase in two years is another evidence of the healthy growth of the Insti¬ 
tute. It has been attained to a considerable extent through the hard work of 
our Secretary-Treasurer, P. S. Dwyer and the cooperation of individual members. 

The Secretary-Treasurer also reports a very satisfactory net gain in assets 
of $2,627 during the year. Nevertheless, financial problems may arise in the 
near future. Printing and other costs have risen sharply, and the printing of an 
enlarged Annals will be an additional drain on our resources. Both the Member¬ 
ship and Development Committees have given some thought to the need for 
additional revenue that may face us soon. They have recommended considera¬ 
tion of the possibility of Institutional Memberships, a device that has been found 
satisfactory by some other societies. A continued growth in membership will 
also help greatly to finance expanded activities. 



152 


REPORT OF THE PRESIDENT 


Committees 

Inter-society affairs: The report of the 1944 Committee on Development, stress¬ 
ing the need for closer cooperation amongst the various societies interested in 
statistics, provided the stimulus for active efforts in this direction. A meeting 
of representatives of these societies was called early in 1945 at the invitation of 
the American Statistical Association. This meeting suggested that a reconsti¬ 
tution of the ASA might enable it to become the central binding organization. 
Accordingly, a committee of the ASA has worked for a considerable time on a 
revision of the ASA constitution, which it is intended to submit to the votes of 
ASA members early in 1947. The new constitution provides for representation 
from other societies on the Council of the ASA, should these societies decide to 
associate or affiliate Avith the ASA. . 

From our own point of view, it has seemed wise to delay action on certain 
internal affairs Avhile awaiting the outcome of these developments in the ASA. 
Thus a statement of policy with regard to the formation of chapters of the TMS 
is needed and the problem has been considered both by a special committee in 
1945 and by the Development Committee in 1946. The latter committee recom¬ 
mends that no decision be made pending examination of the provisions for joint 
sponsorship of local and regional chapters in the new ASA constitution. Simi¬ 
larly, our own Committee on Revising the Constitution and By-Laws has de¬ 
ferred a final report until the attitude of our members towards the new develop¬ 
ments can be expressed. It is to be hoped that decisions can be taken in 1947. 

Tabulation: The advances made in recent years in the construction of new types 
of computing equipment justified an enlargement of our Committee on Tabula¬ 
tion, which now includes experts both on the building of machines and on the 
calculation and use of tables. The committee plans to keep our members in¬ 
formed of progress in this field. 

Government Service: Dr. W. Edwards Doming served as chairman of a new com¬ 
mittee on Mathematical Statistics and Statisticians in the Government Service. 
Although the federal government employs many mathematical statisticians, 
explicit recognition of the profession is lacking in many instances. As has hap¬ 
pened in other fields, statisticians are sometimes officially classed as economists 
and little provision is made for mathematical statisticians in recruitment policies. 
Moreover, it is probable that a number of branches of the government, at present 
unaware of the functions of a statistician, could employ several with profit. 
The neAv committee will endeavor to insure that mathematical statistics is recog¬ 
nized and effectively utilized in the federal service. 

Assistance to libraries: Like other professional societies, the Institute has re¬ 
ceived a number of appeals from libraries in war areas whose periodicals Avere 
looted or destroyed during the Avar. After careful consideration, the Board 
decided that official action should be limited to the free provision of missing 
copies of the Annals to all former subscribers who intend to renew subscriptions 
for the future. In addition, a committee Avith Professor J. Neyman as chairman 



REPORT OF THE PRESIDENT 


153 


was appointed to establish a procedure by which gifts of individual members 
(books, reprints, back numbers of the Annals or cash for the purchase of back 
numbers) could be handled. At the suggestion of this committee a general 
appeal for the small sum of 50 cents per member was circulated with the Decem¬ 
ber billing. Individual collections are also being made at certain centers. 

Teaching: The Committee on Teaching has not made as much progress as it 
would have liked, owing to the dispersal of its members and the taking up of new 
civilian posts. Members have, however, cooperated with the Committee on 
Applied Mathematical Statistics of the National Research Council, which is 
engaged on a somewhat similar survey. 

Rietz lecture: The first lecturer in the new series of lectures in honor of the late 
Henry Lewis Rietz will be Professor A. Wald. His topic will be “Sequential 
Estimation and Multi-Decisions”. The lecture will be delivered in connection 
with the Yale meetings, September 1947. 

Representatives: In addition to its committee work, the Institute cooperates, 
through representatives, with the Division of Physical Sciences of the National 
Research Council, the Joint Committee for the Development of Statistical Ap¬ 
plications in Engineering and Manufacturing, the American Association for the 
Advancement of Science, the Inter-Society Committee on Federation and the 
Policy Committee for Mathematics. The last committee, w r hieh was appointed 
in 1946, will consider important problems that affect the mathematics profession 
as a whole. 

Nominations: The Committee on Nominations, consisting of Professor P. R. 
Rider (chairman), Professor B. H. Camp and Professor G. M. Cox, has made the 
following nominations for officers in 1947. 

President: W. Feller 
Vice-Presidents: J. II. Curtiss 
M. II. Hansen 

Secretary-Treasurer: P. S. Dwyer 

While it is perhaps improper to comment on nominations, I should like to 
express my personal appreciation of Professor Dwyer’s action in being willing 
to offer himself for re-nomination as Secretary-Treasurer. The successful opera¬ 
tion of the Institute rests mainly on the Secretary-Treasurer, and the demands 
of the Office are even more continuous and exacting than those on the editor. 
Professor Dwyer’s splendid work during his first three years of office, carried on 
at considerable sacrifice of his research interests, deserves thi* best thanks and 
appreciation of every member. 

In conclusion, it is a pleasure to express my sincerest thanks to all committee 
chairmen and members and to all representatives for their excellent work for the 
good of the Institute, and to all Institute members for their loyal support. 

W. G. Cochuav, 
President , 1946. 



154 

REPORT OP THE PRESIDENT 

Committee 

Development 

Committees of the Institute 

Personnel 

E. G. Olds (chairman), C. I. Bliss, M. A. 
Girshick, F. C. Mosteller, P. S. Olmstead, 
H. Schefte. 

Membership 


W. Feller (chairman), C. C. Craig, P. A. 
Horst, T. Koopmans 

Program 


J. H. Curtiss (chairman), M. Friedman, B. 
Harshbarger, W. N. Hurwitz, A. M. Mood, 
F. C. Mosteller, J. W. Tukey 

Mathematical Statistics and 
Statisticians in the Govern¬ 
ment Service 

W. E. Deming (chairman) 

Revising the Constitution- 
and By-Laws 

M. H. Hansen (chairman), C. I. Bliss, A. T. 
Craig, J. H. Curtiss, W. Shewhart 

Tabulation 


C. Eisenhart (chairman), P. S. Dwyer, H. 
Goldstine, A. N. Lowan, H. W. Norton, G. R, 
Stibitz 

Teaching 


H. Hotelling (chairman), W. Bartky, W. E. 
Deming, M. Friedman 

Nominations 


P. R. Rider (chairman), B. H. Camp, G. M. 
Cox 

Finance 


P. S. Dwyer (chairman), L. A. Knowler, C. F. 
Roos, F. F. Stephan 

Subscription to Purchase An¬ 
nals for Countries Devas¬ 
tated by War 

J. Neyman (chairman), W. Feller, P. L. Hsu 

Society 

Inter-Society Committee 
Federation 

on 

Representatives 

J. H. Curtiss, P. S. Olmstead 

Policy Committee for Mathe¬ 
matics 

W. Feller 



REPORT OF THE PRESIDENT 


155 


Society 

Joint Committee for the De¬ 
velopment of Statistical 
Applications in Engineer¬ 
ing and Manufacturing 

American Association for the 
Advancement of Science 

Division of Physical Sciences, 
NRC 


Representative* 

F. C. Mosteller, S. S. Wilks 

G. W. Snedecor 

W. Bartky 



REPORT OF THE SECRETARY-TREASURER OF 
THE INSTITUTE FOR 1946 

The Institute of Mathematical Statistics held five meetings during 1946, at 
Cleveland on January 24-27, at St. Louis on March 30, at Ithaca on August 
22-23, at Princteon on November 1, and at Boston on December 28. 

The large number of meetings has necessitated frequent mailings to the 
membership. Memoranda to members, with appropriate enclosures, were sent 
out in January, March, June, July, October, and November. 

The Secretary-Treasurer wishes to acknowledge the cooperation of the mem¬ 
bers of the Institute in paying bills promptly, in considerable activity leading to 
an increase in membership, and in general looking after the interests of the In¬ 
stitute. 

At the beginning of 1946 the Institute had 777 members. During the year 
180 new members joined the Institute, an increase of 23%. However, during 
1946 the Institute lost 57 members. Of these, 15 resigned, 37 were dropped for 
non-payment of dues, and 5 are deceased. Some of the 37 dropped we have 
been unable to contact, and it is very probable that, in some cases, membership 
will be resumed in the future. The net increase in members during the year was 
123, or about 16%, making a total of 900 members. 

The following members died during the year: 

Professor 0. F. Banos 
Professor S. A. Cudmore 
Professor Dunham Jackson 
Dr. Walter F. Schilling 
Professor C. C. Wagner 

The office of the Secretary-Treasurer sent a reprint of an Armais article and 
information about the Institute to 1800 persons interested in Quality Control. 
At least 28 of the new members became members as a result of this drive. As a 
continuation of a campaign started in 1945, the Institute also sent literature 
about the Annals to several hundred libraries and laboratories. 

The Secretary-Treasurer wishes to acknowledge the continued assistance of 
Professor Lloyd Knowler in caring for the back issues of the Annals which are 
stored at Iowa City. 

A few comments about the financial statement which appears below are in 
order. In addition to the increase in membership, mentioned above, the chief 
rise in income resulted from the unprecedented sales in back issues which 
amounted to $3,234.88, an increase over the preceding year (the previous high) 
of 86%. These heavy sales, however, depleted the supplies of many of our early 
issues, so that we were forced to reprint eleven of these issues and also the cumula¬ 
tive index during the year. This cost $2,809.00 (for 500 copies of each) and in¬ 
dicates that a much larger portion of our assets is in inventory, as shown in Ex¬ 
hibit D 


156 



REPORT OF THE SECRETARY-TREASURER 


157 


Following the instructions of the Finance Committee, Professor H. C. Carver 
was paid for his share of all issues in which he and the Institute had joint owner¬ 
ship. 

Nine members have paid life memberships during the year, increasing the 
total of life membership funds by $812.50. 

The net gain in assets of $2,627.23 is very satisfactory even though this gain 
is evident in increased inventory and not in a better cash position. 


FINANCIAL STATEMENT 
December 31, 1945, to December 31, 1946 
A Receipts 


Balance on Hand, December 31, 1945 

$7,548 

22 

Dues 

4,638. 

40 

Li*e Membership Paymln rs 

812 

50 

Subscriptions 

2,057 

54 

Sale of Back Numbers 

3,234 

88 

Income from Investments 

150 

00 

Miscellaneous 

121 

29 

Toial 

$18,562 

83 


B LxPENDnUR* S 

Annals—C urreni 

Office of Editor $125 00 

Waverly Press 4,566 27 


$4,691 27 

Anxais — Back NuMBLRb 

Purchase from H C. Caiver 644 50 

Reprinted 500 copies 2,809 00 

Vol I #1, II #2, II #3, HI #3, IV #1, VII #3, VII #2, VIII 
jffl, 2, 3, 4, Cumulative Index 

Iowa City Office 41 46 

Binding 68 00 


Ofiice of President 

Mathematical Revilws 

Oifice of thi Secretary-Treasurlr 

Printing, Mimeographing, programs, etc (including stamped 


envelopes) $967 14 

Printing 1800 copies of Wald-Wolfowitz article 140 00 

Postage and supplies 375 00 

Clerical help 1,420 25 


3,562 96 
25 62 
100 00 


2,902.39 

Miscellaneous 39 04 

Balance on Hand, December 31, 1946 (Cash and Bonds) 7,241.55 


$18,562 83 



158 


REPORT OF THE SECRETARY-AitEAbURER 


C. Summary of Receipts and Expenditures 

Balance on Hand,* December 31, 1945 $7,548.22 

Receipts during 1946 11,014.61 

Expenditlres during 1946 11,321.28 

Balance on Hand,* December 31, 1946 7,241.55 

D. Comparison of Assets on December 31, 1945 and December 31, 1946 

1946 1946 


US Government G Bonds 

$6,000 00 

$5,000 00 

Life Membership Funds 

/ 888 00 
\ 327.00 

( 1888.00 Bonds 
\ 139 50 Bank Dep. 

Additional Bank Deposits 

333.22 

214.05 

Current Accounts Receivable 

255 35 

452 62 

Estimated Value (Cost) of back issues of Annals 

. 4,497.95 

7,234 58** 

Total 

$12,301.52 

14,928.75 

Net Gain 1946 


2,627.23 


E. Liabilities of Institute of Mathematical Statistics as of December 31,1946 

All bills which have been presented have been paid and there are no outstanding accounts 
against the Institute The $2027.50 in Life Membership payments require the Institute to 
provide the privileges of membership for life for the 26 members who have made payments. 
Also, $2686 71 should be credited to 1947 dues and subscriptions 


December 31, 1946 


Paul S. Dwyer 

Secretary - Treas urer . 


* In form of bank deposit and government bonds. 

** Value of Annals calculated at 67 cents per copy, and based on physical inventoryl 



ANNUAL REPORT OP THE EDITOR POR 1946 

During 1946 there was a considerable increase in the number of manuscripts 
submitted to the Editorial Committee of the Annals . A total of 49 papers in¬ 
cluding 18 short notes were published in the 1946 volume of the Annals. The 
publication of these papers together with various official reports of the Institute 
and the Directory of the Institute required a total of 555 pages. Plans are al¬ 
ready under way to expand the 1947 volume of the Annals to 600 pages. 

During recent years there has been a very noticeable broadening of interest 
in the field of probability and statistical theory on the part of readers and con¬ 
tributors to the Annals. Contributors to the 1946 volume came from university 
departments of astronomy, biology, mathematics, sociology and statistics; from 
Army, Navy and other government groups; and from industrial laboratories and 
quality control departments. More recently, contributions have been received 
from physicists, chemists and other groups. More contributions are now being 
received from overseas than in previous years. Every effort is being made to 
keep the Annals balanced with respect to these various directions of interest in 
probability and statistical problems. It is believed that one of the most effec¬ 
tive things which could be done for the readers of the Annals is to publish ex¬ 
pository articles from time to time on new fields of development in probability 
and statistical theory. Invitations have been accepted by several individuals to 
prepare such articles. 

Dr. Thornton C. Fry has asked to be relieved from the Editorial Com¬ 
mittee, as of January 1, 1947. The Editor wishes to take this opportunity to 
express his gratitude for the service which Dr. Fry has rendered in connection with 
the editorial work on the Annals during the past nine years. 

On behalf of the Editorial Committee for the Armais , the Editor wishes to 
acknowledge with thanks the refereeing assistance which has been provided by 
the following persons during 1946: R. L. Anderson, T. W. Anderson, David 
Blackwell, Z. W. Birnbaum, K. L. Chung, W. J. Dixon, J. L. Doob, M. A. 
Girshick, T. E. Harris, L. Ilenkin, M. Kac, Irving Kaplansky, Bradford F. Kim¬ 
ball, T. Koopmans, H. Levene, H. B. Mann, P. J. McCarthy, F. C. Mosteller, H. 
E. Robbins, D. F. Votaw, J. E. Walsh and C. P. Winsor. The Editor is also 
indebted to the following individuals at Princeton University for preparation of 
manuscripts for the printer, and other editorial assistance: Mrs. Gladys B. Huling, 
Mrs. Eleanor C. Schoenly and J. E. Walsh. 

S. S. Wilks 
Editor . 

December 81, 1947 


159 



CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 


ARTICLE I 
Name and Purpose 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 
Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
members excepted, who have been members for twenty-three months prior to the date 
of voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term as 
determined by the Committee on Membership and approved by the Board of Directors. 

ARTICLE III 

Officers, Board of Directors, and Committee on Membership 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one 
year and that of the Secretary-Treasurer three years. Elections shall be by majority 
ballots at Annual Meetings of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows. At their first meeting subsequent to the adoption of this Constitution, the 
Board of Directors shall elect three members as Fellows to serve as the Committee on 
Membership, one member of the Committee for a term of one year, another for a term of 
two years, and another for a term of three years. Thereafter the Board of Directors shall 
elect from among the Fellows one member annually at their first meeting after their elec¬ 
tion for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 


ARTICLE IV 
Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 

160 



BY-LAWS 


161 


time as the Board of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. Meetings of the Committee on Membership may be held from time to time at the call 
of the Chairman or any member of the Committee provided notice of such call and the 
purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the right to reopen the question at the 
next meeting. Committee business may also be transacted by correspondence if that 
seems preferable. 

4. At a regularly convened meeting of the Board of Directors, four members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 
Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
The Editor of the Annals of Mathematical Statistics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated 
at the discretion of the Board of Directors. 

2. Other publications may be originated by the Board of directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 
Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 



162 


INSTITUTE OF MATHEMATICAL STATISTICS 


BY-LAWS 

ARTICLE I 

Duties of the Officers, the Editor, Board of Directors, and 
Committee on Membership 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 
shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nomina¬ 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre¬ 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute and once a year 
he shall publish in the Annals of Mathematical Statistics a classified list of all Members and 
Fellows of the Institute. He shall send out calls for annual dues and acknowledge receipt 
of same; pay all bills approved by the President for expenditures authorized by the Board 
or the Institute; keep a detailed account of all receipts and expenditures, prepare a finan¬ 
cial statement at the end of each year and present an abstract of the same at the annual 
meeting of the Institute after it has been audited by a Member or Fellow of the Institute 
appointed by the President as Auditor. The Auditor shall report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi¬ 
bility for all editorial matters concerning the editing of the Annals of Mathematical Sta¬ 
tistics . He shall, with the advice and consent of the Board, appoint an Editorial Commit¬ 
tee of not less than twelve ^embers to co-operate with him; four for a period of five years, 
four for a period of three years, and the remaining members for a period of two years, ap¬ 
pointments to be made annually as needed. All appointments to the Editorial Com¬ 
mittee shall terminate with the appointment of a new Editor. The Editor shall serve as 
editorial adviser in the publication of all scientific monographs and pamphlets authorized 
by the Board. 

4. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. The power of election to the different grades of 
Membership, except the grades of Member and Junior Member, shall reside in the Board. 

5. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 



BY-LAWS 


163 


different grades o! membership. The Committee shall review these qualifications period¬ 
ically and shall make such changes in these qualifications and make such recommendations 
with reference to the number of grades of membership as it deems advisable. The power 
to elect worthy applicants to the grades of Member and Junior Member shall reside in the 
Committee, which may delegate this power to the Secretary-Treasurer, subject to such 
reservations as the Committee considers appropriate. The Committee shall make recom¬ 
mendations to the Board of Directors with reference to placing members in other grades 
of membership. The Committee shall give its attention to the question of increasing the 
number of applicants for membership and shall advise the Secretary-Treasurer on plans 
for that purpose. 


ARTICLE II 
Dues 

1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Junior Members shall be two dollars and 
fifty cents. 

The annual dues of Fellows shall be five dollars. The annual dues of Sustaining 
Members shall be fifty dollars. Honorary Members shall be exempt from all dues. 

(a) Exception. In the case that two Members of the Intitute are husband and wife 
and they elect to receive between them only one copy of the Official Journal, the annual 
dues of each shall be three dollars and seventy-five cents. 

(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding yearly dues and which will not other¬ 
wise alter his status as a Member or Fellow. The amount of this payment will depend 
upon the age of this Member or Fellow and will be based upon a suitable table and rate of 
interest, to be specified by the Board of Directors. 

(c) Exception. Any Member or Junior Member of the Institute serving, except as a 
commissioned officer, in the Armed Forces of the United States or of one of its allies, may 
upon notification to the Secretary-Treasurer be excused from the payment of dues until the 
January first following his discharge from the Service. He shall have all privileges of 
membership except that he shall not receive the Official Journal. However during the 
first year of his resumed regular membership he may have the right to purchase, at $2.50 
per volume, one copy of each volume of the Official Journal published during the period 
of his service membership. 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow, Member, or Junior Member include a subscription to 
the Official Journal. The annual dues of a Sustaining Member include two subscriptions 
to the Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may be six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Board of Directors, 
by whom the person’s name may be stricken from the rolls and all privileges of member¬ 
ship withdrawn. Such person may, however, be re-instated by the Board of Directors 
upon payment of the arrears of dues. 



164 


BY-LAWS 


ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 



PROBLEMS IN PROBABILITY THEORY 

By Harald Cramer 
University of Stockholm 

1. Introduction. The following survey of problems in probability theory 
has been written for the occasion of the Princeton Bicentennial Conference on 
“The Problems of Mathematics/* Dec. 17-19, 1946. It is strictly confined to 
the purely mathematical aspects of the subject. Thus all questions poncerned 
with the philosophical foundations of mathematical probability, or;frith its 
ever increasing fields of application, will be entirely left out. 

No attempt to completeness has been made, and the choice of the problems 
considered is, of course, highly subjective. It is also necessary to point out 
explicitly that the literature of the war years has only recently—and still far 
from completely—been available in Sweden. Owing to this fact, it is almost 
unavoidable that this paper will be found incomplete in many respects. * 

I. FUNDAMENTAL NOTIONS 

2 . Probability distributions. From a purely matbjhatical pdlnt of view, 
probability theory may be regarded as the theory of certain classes of additive 
set functions , defined on spaces of more or less general types. The basic struc¬ 
ture of the theory has been set out in a clear and concise way in the well-known 
treatise by Kolmogoroff [53]. We shall begin by recalling some of the main 
definitions. . Note that the word additive , when used in connection with sets 
or set functions, will always refer to a finite or enumerable sequence of sets. 

Let io denote a variable point in an entirely arbitrary space ft, and consider 
an additive class C of sets in ft, such that the whole space ft itself is a member of 
(\ Further, let P(S) be an additive set function, defined for all sets *S belonging 
to the class (\ and suppose that 

P(S) ^ 0 for all S in C, 

• P(ft) = 1. 

We shall then say that P(S) is a probability measure , which defines a probability 
distribution in ft. For any set S in C, the quantity P(S) is called the probability 
of the event expressed by the relation a> Cl S> i.e. the event that the variable 
point to takes a value belonging to S. Accordingly we write 

P(S) = P{co C S). 

Suppose now that c o' = g(<o) is a function of the variable point ca, defined 
throughout J$xe space ft, the values w' being points of another arbitrary space 
ft'. Let S' be a set in ft' and denote by S the set of all points w such that »' = 
0(w) belongs to S'. Whenever S belongs to C, we define a set function P'(S') 
by writing 

P'(S') => P(S). 

165 



1G6 


HARALD CRAMER 


It is then easy to see that P'(S') is defined for all S' belonging to a certain 
additive class C' in the new space 12 ', and that P'(S') is a probability measure 
in 12 ', such that P'(S') signifies the probability of the event «' C S' (which is 
equivalent tow C S). We shall say that P'(S') is attached to the probability 
distribution in 12 ' which is induced by the given distribution in 12 and the function 
= 0(w). 

3. Random variables. Consider in particular the case when a/ is a real 
number £, such that £ = g(u) is a real-valued C-measurable function of the 
argument co. Then C' includes the class B x of all Borel sets S' of the space 12' = 
Ri of all real numbers, and we shall call £ a one-dimensional real random variable . 
The probability of the event £ C S' is uniquely defined for any Borel set S' of 
Ri , as soon as the function 

F(x) = P(l; g x) 

is known for all real x. F(x) is called the distribution Junction ( d.f .) of the 
random variable £. If the function £ = </(w) is integrable over 12 with respect 
to the measure P(S), we write 

£ = f g{w) dP = f x dF(x) f 
Jq J- 00 

and denote this expression as the expectation or mean value of the random vari¬ 
able £. Any real-valued B-mcasurablc function 77 = ft(£) is also a random 
variable with the probability distribution induced by the original w-distribution 
and the function 77 = h(g{ co)). If 77 is integrable over 12 with respect to P, its 
mean value may be written in the form 

#77 = #/i(£) = f h(g( co)) dP = [ h(x ) dF(x). 

Jq J -00 

More generally, if w' = (£ 1 , • • • , £„) is a point in an a-dimensional Euclidean 
space R n , while C' includes the class B n of all Borel sets of R n , we are con¬ 
cerned with an n-dimensional real random variable. The distribution of this 
variable, which is also called the joint distribution of the n one-dimensional 
variables £1 , •••,£*, is uniquely defined, as soon as the joint d.f. 

F(x 1 , • • • , x n ) = P(fc g Xi , • • • , £ n ^ x n ) 
is known for all real x x , • • • , x n . 

The variables £1 , •••,£* are said to be independent , if F(x 1 , • • • , x n ) = Fi(xi) 

• • • F n (x n ), where F v (x v ) is the d.f. of the variable £„. 

The extension to complex random variables is obvious. Suppose e.g. that 
£ = gM and 77 = h(co) are two one-dimensional real variables, and consider 
the complex variable £ + iy = g(u) + ih( co). By definition, we identify the 
distribution of this variable with that of the two-dimensional real variable 
(£, 77)1 and we put 


#(£ + irj) = 2£{ + iEtj. 



PROBLEMS IN PROBABILITY THEORY 


167 


Joint distributions of several complex variables are introduced in a correspond¬ 
ing way. 

4. Characteristic functions. If { is a one-dimensional real random variable, 
the mean value 

<p(z) = Ee** = f c itx dF(x) 

j— CO 

exists for all real z, and we have 

| <p{z) | ^ 1 , *( 0 ) = 1 . 

<p(z) is called the characteristic function (c.f.) of the distribution corresponding 
to the variable £. The reciprocal formula (L 6 vy) 

i r z fT izx _ r - iz v 

F{x) - F(y) = - lim / --- viz) dz, 

ZttZ Z~* oo J—Z Z 

which holds for any continuity points x and y of F, shows that there is a one- 
one correspondence between the d.f. F(x) and the c.f. <p(z). As we shall see 
below, the c.f. provides a powerful analytical tool fj)r operations with prob¬ 
ability distributions. 

When a complex-valued function <p(z) of the real variable z is given, it is 
often important to be able to decide whether <p(z) is or is not the c.f. of some 
distribution. If we assume a priori that ^(0) = 1, each of the following condi¬ 
tions is necessary and sufficient for <e(z) to be a c.f. 

* . A . <p(z) should be bounded and continuous for all 2 , and such that the integral 

f A [ A viz - u)e il( ’- u> dz du 
Jo Jo 

is real and non-negative for all real x and all A > 0 (Cram 6 r [ 11 ], in simplifica¬ 
tion of an earlier result due to Bochner, [4]). 

B. There should exist a sequence of functions ^i(z), ^ 2 (z), • • • such that 

<p(z) = lim I \l/ n (x + z)>p n (x) dx 

n—*00 «/—00 

holds uniformly in every finite 2 -interval (Khintchine, [45]). 

These general theorems are not always easy to apply in practice. Among 
less general results which are more easily applicable, we mention the almost 
trivial fact that a function <p(z) which near 2 = 0 is of the form <p(z) = 1 + o(z 2 ) 
cannot be a c.f. unless <p(z) = 1 for all 2 , and the two following theorems: 

1 ) An integral function <p(z) of order 7 < 1 can never be a c.f. (L 6 vy, [64]), and 

2 ) an integral function <p(z) of finite order 7 > 2 cannot be a c.f. unless the 
convergence exponent of its zeros is equal to 7 (Marcinkiewicz, [72]). The 
latter result shows e.g. that no function of the form e o( ’\ where g(z) is a poly¬ 
nomial of degree > 2, can be a c.f. 

It would be highly desirable to obtain further results in this direction. 



168 


HARALD CRAMER 


The c.f. of the joint distribution of n real random variables fi, • • • , £» is the 
function <p(zi , • • • , z n ) defined by the relation 

*(«i ,•••,*»)= Ee“' lh+ - + ’ M . 

Most of the above results for c.f. in one variable can be directly generalized 
to the multi-variable case. 

5. Random sequences and random functions. Let t be a variable point in 
an arbitrary space T, and consider the space 12, where each point w is a real¬ 
valued function u = x(t) of the variable argument t. Let t \, • • • , t n be any 
finite set of distinct points t. The set of all functions w = x(t) satisfying the 
inequalities 

CL] < x(tj) =5 bj j 0 = 1 f fft)t 

will be called an interval in the space 12. The Borel sets in 12 will be defined as 
the smallest additive class B of sets in 12 containing all intervals. 

Suppose now that, for any choice of n and the t 3 , the variables x(Jt\) y • • •, x(t n ) 
are random variables having a known n-dimensional joint distribution. If the 
family of all distributions corresponding in this way to finite sequences ti , 
satisfies certain obvious consistency conditions, a fundamental theorem 
due to Kolmogoroff asserts that this family determines a unique probability 
distribution in the space 12 of all functions x{t). The corresponding probability 

P(S) = P(x(t) C S ) 

is uniquely defined for all Borel sets S of 12. 

Consider in particular the case where T is the set of non-negative integers 
t = 0, 1, 2, • • • . The space 12 then is the space of all sequences (x 0 , x x , • • •) 
of real numbers. As soon as the joint distribution of any finite number of 
variables x n , • • • , x Vn is defined, and these distributions are mutually con¬ 
sistent, it then follows that there is a unique probability distribution of the 
random sequence (x 0 , x x , •••)> the corresponding probability being defined 
for every Borel set of the space 12 of sequences. Similarly we may consider the 
doubly infinite sequence (• • • , x-x , x 0 , Xx , • • •). 

Consider further the more general case when T is any set of real numbers. 
Then 12 is the space of all real-valued functions co == x(t) defined on the set T, 
and as before the knowledge of the distributions for all finite sets of variables 
•r(h), • • • , x(t n ) permits us to determine a probability distribution in the space 
12 of random functions x(t), the probability P(S) = P(x(t) C S) being always 
defined for all Borel sets S in 12. 

The generalization of the above considerations to complex-valued random 
sequences and functions is immediate. 

• 

6 . Various modes of convergence. Consider a sequence F x {x) y F^x), 

of d.f:s, and let the corresponding c.f:s be <pi(t), t ), • • • . In order that F n (x) 



PROBLEMS IN PROBABILITY THEORY 


169 


converge to a d.f. F(x), in every continuity point of the latter, it is necessary 
and sufficient 1 that <p n (t) converge for every real t to a limit <p(0 which is con¬ 
tinuous at / = 0. Then <p(0 is the c.f. corresponding to the d.f. F(x). 

Further, let x and X \, x 2 , • • • be complex-valued random variables, such 
that the random sequence (x, x t , x a , • • •) has a well defined distribution. We 
shall be concerned with various modes of convergence of x n to x. 

A . When P{ \ x n — x | > e) —> 0 as n —» x, for any e > 0, we shall say that 
x n converges to x in probability. 

B. When E | x n — x \ y —> 0, as n —> oo, where y > 0 is fixed, we shall say that 
x n converges to x in the mean of order y. Unless otherwise stated we shall in 
the sequel always consider the case 7 = 2, and in this case we shall use the 
notation 

l.i.m. x n — x. 

n —*00 

C. When P(lim x n = x) — 1, we shall say that x n converges with probability 

n —*oo 

one, or converges almost certainly to x. 

With respect to the last definition, we may remark that the set defined by 
the relation lim x n = x is always a Borel set in the space of our random sequence, 
so that the probability of this relation is well defined. In fact, this probability 
is given by the expression 

lim lim lim P (| x w — x | < - for v = n, n + 1, • • • , n + p j 

m—oo n —*oo p—*eo \ ?7l / 

where the limit process applies to a probability attached to a Borel set in a finite 
number of dimensions. The case of almost certain convergence is precisely 
the case when this expression takes the value 1. 

Convergence in the mean of any positive order, as well as almost certain 
convergence, both imply convergence in probability, which may be written 
symbolically B —> A and C —> A . Between B and C, there is no simple relation 
of this kind. Further, and B both imply almost certain convergence for any 
partial sequence x ni , x ri2 , • • • such that the subscripts nincrease sufficiently 
rapidly with k. 

If. PROBLEMS CONNECTED WITH THE ADDITION OF 
INDEPENDENT VARIABLES 

7. During the early development of the theory of probability, the majority 
of problems considered were connected with gambling. The gain of a player 
in a certain game may be regarded as a random variable, and his total gain in a 

1 As I have already stated in a paper published in 1938, there is an error in the state¬ 
ment of this theorem given in my Cambridge Tract [9] Random Variables and Probability 
Distributions. For the truth of the theorem, it is essential that <p n {t) should be supposed 
to converge to (pit) for every real t. However, in the particular case when the limit (pit) 
is analytic and regular in the vicinity of ( 0, it can be proved that it is sufficient, to assume 

convergence in some interval | t | < a. 



170 


HARALD CRAMER 


sequence of repetitions of the game is the sum of a number of independent 
variables, each of which represents the gain in a single performance of the game. 
Accordingly a great amount of work was devoted to the study of the probability 
distributions of such sums. A little later, problems of a similar type appeared 
in connection with the theory of errors of observation, when the total error was 
considered as the sum of a certain number of partial errors due to mutually 
independent causes. At first onlj* particular cases were considered, but gradu¬ 
ally general types of problems began to arise, and in the classical work of Laplace 
several results are given concerning the general problem to study the distribution 
of a sum 

Zn = Zl + * * ‘ 

of independent variables, when the distributions of the Xj are given. This 
problem may be regarded as the very starting point of a large number of those 
investigations by which the modern Theory of Probability vas created. The 
efforts to prove certain statements of Laplace, and to extend his results further 
in various directions, have largely contributed to the introduction of rigorous 
foundations of the subject, and to the development of the analytical methods. 
At the same time, more general types of problems have developed from the 
original problem, and the number and importance of practical applications 
have been steadily increasing. 

8. Composition of distributions. Let . 1*1 and be two independent variables, 
with the d.f.’s F 1 and F 2 , and the c.f.’s <pi and <p« , and let the sum xi -f- »r 2 have 
the d.f. F and the c.f. <p. Then 

F(x) = f F 1(2 — y) dFi(y) = f b\(x - y) dF x (y). 

J — 00 J— 00 

We shall say that F is the composition of F x and F 2 , and w rite this as a symbolical 
multiplication: 

F = Fi * F 2 = F$ ♦ F 1 . 

To this symbolical multiplication of the d.f:s corresponds a real multiplication 
of the c.f.’s: 

<p{z) = (Pi (z)<f>2{z). 

The operation of composition is both commutative and associative, so that 
any symbolical product F = F x * F 2 • • * * F n is uniquely defined and independent 
of the order of the components. When at least one of the components is con¬ 
tinuous (absolutely continuous), the same holds for the composite, and in 
many cases it is true that the composite is at least as regular as the most regular 
of the components (L6vy, [58], [63], etc.). However, this general statement 
does not hold generally, as is shown by an interesting example due to Raikov, 
[ 77 ], where Fi and F 2 are integral analytic functions, while the composite F — 
Fi*F 2 is not regular at the origin. 

It seems to be an important unsolved problem to find convenient restrictions 



PROBLEMS IN PROBABILITY THEORY 


171 


ensuring the validity of the above statements of the “smoothing effect” of 
the operation of composition. 

When F = Fi * F 2 , we may say that F is “divisible” by each component Fi 
and F 2 , and it seems natural to try to develop a theory of symbolical factoriza¬ 
tion for d.f.’s. In this connection, it is important to note that symbolical divi¬ 
sion is not unique. In fact, Khintchine has shown by an example that it is 
possible to find the d.f.'s F, F lf F 2 , and F z such that 

F = Fi * Fi = Fi * F 8 , 

while Fi 7 ^ Fz . Another fundamental problem belonging to this order of ideas 
is to decide whether a given d.f. F is decomposable or not. F is called decom¬ 
posable, if there is at least one representation of the form F = Fi * F 2i where 
each component F v has more than one point of increase. So far, this problem 
has only been solved in very special cases, and the general problem still re¬ 
mains open for research. A particular ease of some interest would be to know 
if there exists an absolutely continuous and indecomposable d.f., such that 
F{a) = 0 and F(b) = 1 for some finite a and b. 

As soon as we restrict ourselves to certain special classes of distributions, 
it is possible to reach results of a more definite character concerning the factori¬ 
zation problems. Some results of this type will be considered below. 


9. Closed families of distributions. The fact that certain families of dis¬ 
tributions are closed with respect to the operation of composition has played 
an important part in many applications. If Fi and F 2 belong to a family of 
this character, so does the symbolical product F = Fi * F 2 . We first give some 
simple examples of such families. 

The normal distribution. The d.f. F has the form F = <£ 
a > 0 , and 


( " a 1 w ^ ere 


0(.c) = 


V2i 


r 

^2?r J-OC 


2 ) 


dt. 


The c.f. corresponding to F is e mit and it follows that for any real , 
m 2 and any positive <ti , (r 2 we have 

* C^r') •* (rrr) ~ * ■ 


where 


m = mi + m 2 , 


2 , 2 
= 01 T* 02 ♦ 


The Poisson distribution. Here the d.f. is F = F(x\ X, m , a) where X > 0, 

a 5 * 0 f and F is a step-function with a jump equal to - e~ x in the point x = 

m _|_ where v = 0 , 1 , • • • . The corresponding c.f. is e m,,+X(,ou " I) , and it 
follows that for any fixed a we have 

F(x; Xi, Wi, a) * F(x\ \ 2 y miy a) = F(x; Xi + X 2 , mi + m 2 , a). 



172 


HARALD CRAMER 


X /•* 

The Pearson Type III distribution. F — F(x\ a-, X) = ~rr\ / t x ~ l r~ at dt, 

I (A) Jo 

, and for any fixed a > 0 and any 

positive Xi and X 2 we have 

F(x\ a , Xi) * F(:r; or, X 2 ) = F(x ; a, Xi + X 2 h 

Stable distributions. We shall say that a closed family is stable, when all 
its members are of the form F{ax + 6), where F is a d.f., while a > 0 and b are 
constants. Obviously the normal family is an example of a stable family. It 
has been shown by L6vy and Khintchine [49], that a d.f. F(x) generates a stable 
family when and only when the logarithm of its e.f. is of the form 

(9.1) log <p(z) = &iz - y 1 z ^1 + id , 

where a, 0, 7, S are real constants such that 

0 < or ^ 2, 7 > 0, |*| £ 1, 

while 


(x > 0). The corresponding c.f, 


• (\ iz 
,s v'-« 



- log I 2 I 

7T 


for a ^ 1 
for a = 1 . 


For a = 2 we obtain the normal family. 

A more general and very important dosed family is the family / of infinitely 
divisible distributions. A d.f. F belongs to I if to every n = 1 , 2, ••• there 
exists a (l.f. G such that F = fr InJ , where G ln] denotes the symbolical nth power 
of G. Obviously the family / is a closed family which contains all the families 
mentioned above. L<svy [GO], [63], has shown that F is infinitely divisible when 
and only when the logarithm of its c.f. is of the form 


log *(*) = Hiz - yz 2 + jL (e““ - 1 - 


dM(u) 


izu \ 

i"+”u7 


dN(u), 


where 0 and 7 > 0 are real constants, while M{u) and N{u) are non-decreasing 
functions such that 


M(— co) == A(-{- 00 ) = 0 , 

J u 2 dM(u) < 00 and J u 2 dN{u) 


for any finite a > 0 . When M and N reduce to zero, we obtain the normal 
family. When 7 = 0 and one of the functions M and N reduces to zero, while 



PROBLEMS IN PROBABILITY THEORY 


173 


the other is a step-function with a single jump equal to X at the point x — a, 
we obtain a Poisson family. Generally, it follows from (9.2) that any infinitely 
divisible distribution may be regarded as a product of a normal distribution 
and a finite, enumerable or continuous set of Poisson distributions. 

The representation of log (p(z) in the form (9.2) is unique. It follows that 
the problem of finding all possible factorizations of an infinitely divisible d.f. F 
can be completely solved, as long as we restrict ourselves to factors which are 
themselves infinitely divisible. In fact, in order that 

F = 

where all three d.f/s belong to /, it is necessary and sufficient that the logarithms 
of the corresponding c.f.’s should be of the form (9.2), with 

? = ft + ft > 7 = 7i + 7t 9 

M = Mi + M 2 , N = Ni + N*. 

In the two simple cases of the normal and the Poisson distributions, the 
decompositions obtained in this way remain the only possible, even if we remove 
the restriction that the factors should belong to I. Thus in any factorization 
of a normal distribution, all factors are normal (Cramer, [8]), while in any fac¬ 
torization of a Poisson distribution, all factors belong to the Poisson family 
(Raikov, [75]). For the type III distribution, and the non-normal stable dis¬ 
tributions, however, the corresponding property does not hold. 

In some cases, an infinitely divisible distribution may be represented as a 
product of indecomposable distributions, or as a product of an indecomposable 
distribution and another infinitely divisible distribution. The results so far 
obtained in this direction (Ldvy, [63], [64], Khintchine, [46], [47]; Raikov, [76]) 
are all concerned with more or less particular cases, and the general factoriza¬ 
tion problem for infinitely divisible distributions still remains unsolved. A 
particular case of some interest would be the case when the functions M and N 
are both absolutely continuous. There does not seem to have been given any 
example of this type, where a factor not belonging to I may occur. 2 

Finally we mention a general theorem due to Khintchine, [46], which asserts 
that an arbitrary d.f. F may be represented in one of the forms 

F - G t F = H or F = G * //, 

where G is infinitely divisible, while II is a finite or infinite product of inde¬ 
composable factors. This seems to be practically the only result so far known 
concerning the factorization of a general distribution. 

A certain number of the results mentioned above have been generalized to 
multi-dimensional distributions. 


■While the present paper was being printed, I have proved that such factors do occur, 
as soon as at least one of the derivatives M' and N' is bounded pway from zero in some 
interval (—o, 0) or (0, a). 



174 


HARALD CRAMER 


10. The Laws of large numbers. In modern terminology, the classical 
Bernoulli theorem may be expressed in the following way. Let Xi , x 2 , • • • be 
a sequence of independent variables, such that each x, may only assume the 
values 1 and 0, the corresponding probabilities being p and q = 1 — p. Then 
the arithmetic mean 


( 10 . 1 ) 


Zn _ Xj + - » + X n 

n n 


converges in probability to p, as n —■» co. 

Both classical and modern authors have laid down much work on the gen¬ 
eralization of this simple result in various directions. Generally, we shall say 
that a sequence of random variables , z 2 , • • • satisfies the Weak Law of Large 
Numbers if there exist two sequences of constants ai, a 2 , • • • and &i, b 2 , • • • , 
such that On > 0, and 

Zn — b n X 1 + • • • + X n — b n 

On On 


converges in probability to zero.. 

Let X \, x 2 • • • be independent variables, such that x v has the d.f. F,(x). 
It has been shown by Feller [27] that for any given sequence Oi, Oj, • • • , the 
conditions 


( 10 . 2 ) 


E / dF,{x) = o(l), 

*'-1 *1*1 >°n 

E [ x dF v (x) = o(o 2 n ), 

v_l J|x|<a» 


are sufficient for the validity of the weak law of large numbers , and that the 
corresponding sequence In , b 2 , • • • can be defined by 


6 n = E f xdF,(x). 

v-l •'1*1 <a n 

When there is a constant c > 0 such that for all v 


(10.3) F„(+0) > c, F,(-0) < 1 - c, 


the conditions are also necessary. This theorem contains as particular cases 
all previously known results in this direction. A simple NS condition for the 
existence of at least one sequence oi, Oo, • • • such that 10.2 holds does not seem 
to be known. 

When the weak law is-satisfied, this means that, for any given « > 0 and for 
any fixed large n , there is a probability very near to 1 that the sum z n = Xi + 
• • • + x n will fall between the limits b n =b ea n . The more stringent condition 
that, with a probability tending to 1 as n —+ qc , z ¥ will fall between the limits 

z — b 

b r db ca p for all values of v ^ n is equivalent to the condition that — - ? CO n- 

a n 



PROBLEMS IN PROBABILITY THEORY 


175 


verges almost certainly to zero. When this holds, we shall say that the variables 
x, satisfy the Strong Law of Large Numbers. The most important result so far 
known in this connection is concerned with the case a n — n, and is expressed 
by the following theorem (Kolmogoroff, [52], [55]): 

When the x v are independent and (10.3) holds , a sufficient condition for the valid¬ 
ity of the strong law with a n = n consists in the simultaneous convergence of the 
two series 


2 f dF n (x) and 2 \ f x 2 dF n (x). 

J \x\>n n 2 J\x\<n 

Some improved conditions of tills type have been given by Marcinkicwicz 
and Zygmund, [73], but the problem of finding a NS condition for the strong 
law is still unsolved, even in the case a n = n. 

Important generalizations of the laws of large numbers to cases when the 
x„ are not assumed to be independent have been given i.a. by Khintchine [44], 
L4vy [G2], [63] and Lofeve [67]. 

11. The central limit theorem and allied theorems. It was already known 
to De Moivre that, in the case 10.1 of the Bernoulli distribution, the di. of 
the normalized sum 


Xi + - • • + x n — np 
a/ npq 

tends, as n —► <*>, to the normal d.f. <t>(x). Considerably more general results 
in this direction were stated by Laplace. After a long series of more or less 
successful attempts, a rigorous proof of the main statements of Laplace was 
given in 1901 by Liapouncff, [65]. More general cases were later considered i.a. 
by Lindeberg [66], Ldvy [61], [63], Khintchine [43] and Feller, [25]. The follow¬ 
ing final form of the Central Limit Theorem is due to Feller. 

Consider the expression 

%n b n X\ • • * ”f“ x n b n 

<ni > “• - -s- * -> 

where the x„ are independent variables. We shall say that the x, obey the 
central limit law, if the sequences {a*} and {b P j can be found such that the 
d.f. of u n tends to </>(x) as n —» oo. In order to avoid unnecessary complica¬ 
tions, we shall restrict ourselves to sequences {a,} such that 


a, —> + oo, 


a. 


and we shall assume that the conditions (10.3) are satisfied. Then Feller’s 
theorem runs as follows: 



176 


HARALD CRAMER 


The independent variables , x 2 , * • • obey the central limit law if, and only if, 
there exists a sequence q n —* °o such that simultaneously 


( 11 . 2 ) 


2 f dF ¥ (x) -> 0 , 

r-l J\x\>g n 


7.H 


1*1 <«n 


a: 2 dF v (x) 


—» GO . 


WAen tfiese conditions are satisfied, explicit expressions for the a n and b n can be 
obtained. 

Feller’s theorem gives a complete solution of the problem. However, we 
might still try to express in a more direct way the condition that the q n should 
exist. We may also ask what happens when the conditions (11.2) are not 
satisfied. Some particular cases of the latter question will be considered below. 
However, very few general results are known in this direction. 

The central limit theorem has been extended in various directions. Bern¬ 
stein [3], L£vy [(32], [63], Lofcve [67] and others have considered cases where the 
x y are not assumed to be independent. Important results have been reached 
but still much remains to be done. 

On the other hand, several authors have considered symmetrical functions, 
other than sums, of n independent random variables. The problem of investi¬ 
gating the asymptotic behaviour of the distributions of such functions, as n 
tends to infinity, is of great importance in the theory of statistical sampling 
distributions. It is known (c.f. e.g. Cramdr, [15]) that under certain general 
regularity conditions there exists a normal limiting distribution. However, it 
is also known that it is possible to give examples of particular functions (such 
as e.g. the function which is equal to the largest of the n variables), where there 
exist limiting distributions which are non-normal. The conditions under 
which this phenomenon may occur seem to deserve further study. 

A further problem belonging to the same order of ideas is to find a closer 
asymptotic representation of the d.f. of the standardized sum z n than that pro¬ 
vided by the normal function Consider e.g. the simple case when the x P 

are independent variables all having the same d.f. F(x) with a finite mean m, a 
finite variance a 2 , and finite moments up to a certain order k ^ 3. Let G n (x) 
be the d.f. of the variable 


Xi + • • • + x n — nm 
<r\/ n 

It then follows from a theorem of Cram6r [5], [9] that, as soon as the d.f. F(x) 
contains an absolutely continuous component, there is an asymptotic expansion 

( 11 . 3 ) (?„(*) = <t>(x) + £ e " lV2 + Oin-™' 2 ), 

Vmml n 

where the constant implied by the 0 is independent of n and x. Cramer has 
also given similar expansions in more general cases, and his results have been 



PROBLEMS IN PROBABILITY THEORY 


177 


further extended by P. L. Hsu [39], who deduces analogous expansions also for 
other functions than sums. The most general conditions under which expansions 
of this type exist are still unknown. 

It follows from (11.3) that the difference G n (x) — <f>(x) is, for any fixed x , 
of the order n“* as n —► <*. It is often important to know r the asymptotic 
behaviour of G n (x) when n and x increase simultaneously, and in that case (11.3) 
yields only a trivial result. This case has been investigated by Cram6r [10], 
and Feller [29], and the results so far obtained permit important applications to 
the so called law of the iterated logarithm (cf. below). However, it seems likely 
that similar results may be obtained in considerably more general cases than those 
hitherto investigated. 

A further interesting type of problems belonging to this order of ideas may 
be approached in the following way. Consider the variables (11.1) in the par¬ 
ticular case when , x 2 , • • • are independent variables all having the same 
d.f. F(x). When the a n and b n can be found such that the d.f. of the normalized 
sum u n tends to 0(z), we shall say that F belongs to the domain of attraction of 
the normal law. Feller’s theorem gives a NS condition that this should be so. 
Now when this condition is not satisfied, it may still occur that the a n and b n 
can be so chosen that the d.f. of u n tends to a limiting d.f. ^(x), which is neces¬ 
sarily different from <f>(: r). Then it is easily seen that ^f(x) must be a stable 
distribution, with its c.f. defined by (9.1), and it is natural to say that F belongs 
to the domain of attraction of 4>. NS and sufficient conditions that this should 
hold have been given by Doeblin [16], and Gnedenko [34]. When the a n and 
b n cannot be found such that the d.f. of the normal sum u n converges to a limit, 
it may still be possible to obtain a limiting d.f. by considering only a partial 
sequence u ni , n n2 , • • • . Khintchine [47] has proved the interesting theorem 
that the totality of limiting d.f.’s that may be obtained in this way coincides 
with the class of infinitely divisible d.f.’s defined by (9.2). There are also 
further results in the same direction given by Bawly [2], Khintchine [44], L6vy, 
[61]—[63], and Gnedenko, [35]. 

12. The law of the iterated logarithm. Consider a sequence of independent 
variables x x , x 2 , • • • , such that the mean Ex n = 0 for all n, while the variances 
Ex\ = a 2 n are finite. Put s\ = a] + • • • + <r 2 n , and suppose that the variables 
obey the central limit law with a n = s n , 6„ = 0. (In particular this will be 
the case when all x n have the same distribution.) For any function ^(n) tending 
to infinity with n we then have 

(12.1) lim P (| z„ | > Sn^(n)) = 0. 

n~*oo 

On the other hand, if \p(n) tends to a finite limit > 0, the same probability 
has a positive limit. 

It seems natural to consider the relation within the brackets in (12.1) not 
only for a single large value of n, but to require the probability that this relation 



178 


HARALD CRAMER 


holds simultaneously for an infinite number of values of n. The development 
of this problem has led to the so called law of the iterated logarithm. 

We shall in this respect use the following terminology due to L6vy. A non¬ 
decreasing positive function \[/(n) will be said to belong to the lower class with 
respect to the variables x» if, with a probability equal to one, there are infinitely 
many n such that 

| Zn | > Sn^(tt). 

On the other hand, yj/(n) will be said to belong to the upper class if the prob¬ 
ability of the same property is equal to zero. 

Every yj/in) belongs to one of these two classes. This is a special case of the 
so called null-or-one law : if S is a Borel set in the space of the independent random 
variables X \, rr 2 , • • • , such that any two points differing at most in a finite num¬ 
ber of coordinates either both belong to S or both belong to the complementary 
set, then P(S) can only assume the values 0 or 1. 

It was proved by Kolmogoroff [51] that, subject to certain restiictions, the 
function 

tin) = Vc log log s n 


belongs to the lower class for any c < 2, and to the upper class for any c > 2, 
which may be expressed by the relation 


( 12 . 1 ) 


■(* 


P (lim sup 


- 0 - 


1 . 


s„ V2 log log s„ 

More general results were proved by Feller [30], who proved i.a. that, subject to 
certain restrictions, yp(n) belongs to the lower or upper class according as 


( 12 . 2 ) 


^ +{n)e 

S n 


(**(n)/2) 


is divergent or convergent (in certain special cases, this had been previously 
found by Kolmogoroff and Erdos [24]. Feller also proved a more compli¬ 
cated result, wdiich contains the above as a particular case, and from which 
it follows that the simple criterion (12.2) no longer holds when the restrictions 
imposed in its proof are removed. 


13. Convergence of series. For any sequence of random variables x n , the 
probability 

P converges^ 

has a uniquely determined value. When the x n are independent, it follows from 
the null-or-one law* that this probability is either 0 or 1. By a theorem of 
Khintchine and Kolmogoroff [48], the value 1 is assumed w hen and only w r hen 
the three series 




dF n , 



PROBLEMS IN PROBABILITY THEORY 


179 


are convergent, where 

x n when | x n | 1. 

Vn = 

0 when | x n | > 1. 

For the case when the x n are not assumed to be independent, various results 
have been given by L4vy [63] and others, but our knowledge of the properties 
of these series is still not very advanced. 

14. Generalizations. In several instances it has been pointed out above 
that the results concerning sums of independent variables may, to a certain 
extent, be extended to cases when the variables are not independent. Generally 
the independence condition has then to be replaced by seme condition restricting 
the degree of dependence. Results of this type were first give by Bernstein 
[3], and then in more general cases by L6vy [62], [63], and Lo&ve [67]. However, 
this field has so far only been very incompletely explored. 

Similar remarks apply to the generalization of the various theorems quoted 
above to cases of variables and distributions in more than one dimension. 

III. STOCHASTIC PROCESSES 

16. The theory of random variables in a finite number of dimensions is able 
to deal adequately with practically all problems considered in classical prob¬ 
ability theory. However, during the early years of the present century, there 
appeared in the applications various problems, where it proved necessary to 
consider probability relations bearing on infinite sequences of numbers, or even 
on functions of a continuous variable. 

The mathematical set-up required for the study of such problems involves 
the introduction of probability distributions in spaces of random sequences or 
random functions (cf. 5 above). Generally, any process in nature which can be 
analyzed in terms of probability distributions in spaces of these types will be 
called a stochastic process . It is convenient to apply this name also to the prob¬ 
ability distribution used for the study of the process. We shall thus say, e.g., 
that a certain random function x(t) is attached to the stochastic process which 
is defined by the probability distribution of x(t). In the majority of applica¬ 
tions, the variable t will represent the time, and we shall often use a terminology 
directly referring to this case. However, there are also other types of problems 
in the applications (t may e.g. be a spatial variable in an arbitrary number of 
dimensions), and it is obvious that the purely mathematical problems connected 
with these classes of probability distributions will have to be considered quite 
independently of any concrete interpretation of the variable t or the funcion x(t). 

A well-known example of this type of problems is afforded by the Brownian 
movement. Let x(t) be the abscissa at the time t of a small particle immersed 
in a liquid, and subject to molecular impacts. In every instant, the quantity 
x(t) receives a random impulse, and the problem arises to study the behaviour 
of x(t). According as we are content to consider x(() for a discrete sequence 
of f-points, say for t = 0,1,2, • • • , or we wish to consider all positive values of t , 



180 


HARALD CRAMER 


we shall then have to introduce a probability distribution in the space of the 
random sequence :r(0), z(l), • • • , or in the space of the random function x(t), 
where t > 0. We may then discuss such questions as the distribution of x(t) 
for a given value of the joint and conditional distributions of x(t) for two or 
more values of t , and, in the case of a continuous variable t t continuity, differen¬ 
tiability and other similar properties of the random function x(t). 

Wiener [82], [83] (cf. also Paley and Wiener [74]) was the first to give a rigorous 
treatment of this process. He proved in 1923 that it is possible to define a 
probability distribution in a suitably restricted functional space, such that the 
increment A x(t) = x(t + A t) — x(t) is independent of x(t) for any At > 0. 
With a probability equal to 1, the function x(t) is continuous for all t > 0, and 
for any fixed t > 0, the random variable x(t) is normally distributed. 

Another example of stochastic processes studied at this stage occurs in the 
theory of risk of an insurance company. Let x(t) denote the total amount 
of claims up to the time t in a certain insurance company. As in the case of 
the Brownian movement, it may seem natural to assume that the increment 
A x(t) is independent of x(t). On the other hand, x(t) is in this case an essen¬ 
tially discontinuous function, which is never decreasing, and increases only by 
jumps of varying magnitudes occurring for certain discrete values of /, which are 
not a priori known. Processes of this type were studied by F. Lundberg [09], 
[70], H. Cram6r [6] and others. 

Further examples of particular processes were discussed in connection with 
various applications, but no general theory of the subject existed until 1931, 
when Kolmogoroff published a basic paper [53] dealing with the class of stochastic 
processes which will here be denoted as Markoff processes (Kolmogoroff uses the 
term “stochastically definite processes”), of which the two examples mentioned 
above form particular cases. The theory of this class of processes was further 
developed by Feller [26], [28]. In 1934, Khintchine [42] introduced another 
important class of processes known as stationary processes. From 1937, the 
general theory of the subject was subjected to a penetrating analysis in a series 
of important works by Doob [18]-[22]. 3 

16. Probability distributions in functional spaces. We have seen in 5 
above how a probability distribution in the space of all functions x(t) may be 
defined, when t varies in an arbitrary space T. Generally, we shall here con¬ 
tent ourselves to consider the cases when T is the set of all real numbers, or the 
set of all non-negative real numbers. Most results obtained for these cases 
will be readily generalized to cases when t varies in a Euclidean space of a finite 
number of dimensions. ‘On the other hand, when T is enumerable, say consist¬ 
ing of the points t = 0, =fcl, rfc2, • • • , so that we are concerned with a random 
sequence :r(0), x(±l), • • • , the results for the continuous case will generally 
hold and assume a simpler form which will not be particularly stated herf/. 

*A further interesting paper by Doob has appeared while the present paper was being 
printed: “Probability in function space”, Bull. Amer. Math. Soc., Vol. 53 (1947), pp. 15-30. 



PROBLEMS IN PROBABILITY THEORY 


181 


The case when T is a space of an infinite number of dimensions does not seem 
to have been considered so far. 

In the present paragraph, it will be convenient to assume the function x(t) 
to be real-valued, but the generalization to a complex-valued x(t) requires 
only obvious modifications. In the sequel we shall sometimes consider the 
real-valued and sometimes the complex-valued case, according as the occasion 
requires. 

Let now X be the space of all real-valued functions x(t) of the real variable 
t , where — « < t < «. According to 5, a probability measure P(S ) is uniquely 
defined for all Borel sets S in X by means of the family of joint distributions 
of all finite sequences x(t \), • • • , x(f n ). In fact, P(S) can be defined for a more 
general class of sets than the Borel sets. For any set S in X , we may define 
an outer P-measure P(S) as the lower bound of P(Z) for all sums Z of finite or 
enumerable sequences of intervals, such that S C Z. Further, the inner P- 
measure P(>S)is defined by the relation P(S) = 1 — F(X — S ). When the 
outer and inner measures are equal, S is called P-measurable, and P(S) is defined 
as their common value. Any P-measurable set differs from a Borelset by a 
set of P-measure zero. 

In many cases, this definition will be sufficient for an adequate treatment 
of the problems that we wish to consider. However, in other cases we encounter 
certain characteristic difficulties, which make it desirable to consider the pos¬ 
sibility of amending the basic definition. Thus it often occurs that we are 
interested in the probability that the random function x(t) satisfies certain 
regularity conditions in a non-enumerable set of points t. We may, e.g., wish 
to consider the probability that x(t) is continuous for all t , that x(t) should 
be Lebesque-measurablc for all t y that x{t) S k for all t , etc. Let S denote the 
set of all functions satisfying a condition of this type. It can then be shown 
that the inner measure P(S) is always equal to zero so that S is never measur¬ 
able, except in the (usually trivial) case when P(S ) = 0. 

Consequently many interesting probabilities are left undetermined by the 
general definition of a probability distribution in X given above. The pos¬ 
sibility of modifying the definition so as to enable us to study probabilities of 
this type has been thoroughly investigated by Doob [18]. He considers a 
subspace X 0 of the general functional space X, where X 0 is chosen so as to 
contain only, or almost only, “desirable” functions, i.e. functions satisfying 
such regularity conditions as seem natural with respect to the problem under 
investigation. We start from a given probability measure P{S) in X, and ask 
if it is possible to define a probability measure in the restricted space X 0 , which 
corresponds in some natural way to the given distribution in X. Let So be 
a set in X 0 , and suppose that it is possible to find a P-measurable set S in X 
such that SX C = S 0 . According to Doob, a probability measure P 0 in X 0 
is then uniquely defined by the relation 

Po(So) - P(S) 



182 


HARALD CRAMER 


if and only if the condition 

TO) = 1 

is satisfied. 

The problem is thus reduced to finding a subspace X 0 of outer P-measure 1, 
such that X 0 contains only functions of sufficiently regular behaviour. When 
this can be done, we can restrict ourselves to consider only functions x(t) be¬ 
longing to X 0 , the probability distribution in this space being defined by the 
measure P 0 . We shall then say that x{t) is a random function, attached to a 
stochastic process with the restricted space X 0 . Doob has obtained a great 
number of interesting results in this connection, e.g. with respect to the problem 
of choosing X 0 such that it contains almost only Lebesque-moasurable functions, 
or such that the probability of the relation x(t) ^ k has a well-defined value for 
all k. In particular he has shown that the last problem can be solved for 
any given P-measure. However, our knowledge cf the various possibilities 
which exist with respect to the choice of X 0 is still very incomplete, and it seems 
likely that further important results may be reached along this line of research. 

An alternative method of introducing probability distributions in functional 
spaces has been used by Wiener [82], [83], (cf. also Paley and Wiener, [74]). 
Consider a given probability measure IT in an arbitrary space ft, defined for all 
sets 2 of an additive class C. Let x(t , co) denote a function (real- or complex¬ 
valued, as the case may be) of the arguments t (real) and w (point in ft), such that 
x{t , co) for every fixed t becomes a C Y -measurable function of co. On the other 
hand, when o> is fixed, x(Jt ) o>) = x(t) reduces to a function of the real variable /. 
Let X 0 denote the set of all functions x(t) corresponding in this way to points of 
ft. Further, let So — SX Q , where S is a Borel set in X, and let 2 denote the set 
of all points w such that x(t , o>) C S 0 . Then 2 belongs to (\ and a probability 
measure P 0 in the functional space X 0 is uniquely defined by the relation 

(16.1) Po(£o) = 11(2). 

The relations between the two modes of definition have been discussed by 
Doob and Ambrose [23] who have shown that they are largely equivalent. 
However, it seems likely that in particular problems the one or the other pro¬ 
cedure may sometimes be the more advantageous, and further investigations 
on this subject seem desirable. 

17. Processes with a finite mean square. Consider a stochastic process 
defined by a probability measure P(S) in the space X of all complex-valued 
functions x(t) of the real variable l. For any fixed to , the random variable 
x(to) is then a complex-valued function of the variable point x(t) in the space 
X, i.e. a point Q tQ in the space ft of all complex-valued functions defined on X. 
When to varies, the point Q, 0 describes a “curve” in ft, which then corresponds 
to our stochastic process. 



PROBLEMS IN PROBABILITY THEORY 


183 


Suppose, in particular, that the mean square 

E | *«)!*= f \x(l)\ 2 dP 

Jx 

is finite for any fixed value of t. This implies that for fixed t the function 
x(t) belongs to L 2 over X , relative to the probability measure P. The random 
variable x(t) may then be regarded as an element of the Hilbert space H of all 
complex-valued functions / belonging to L 2 over X , the inner product (/, g) of 
two elements / and g being defined by the relation 

(f,g) = fjgdP = E(f$). 

The stochastic process to which x(t) is attached then corresponds to a “curve” 
in II (Kolmogoroff, [56], [57]), so that the well-known theory of Hilbert space is 
available for the study of the process. In particular, convergence in the usual 
metric of Hilbert space is equivalent to convergence in the mean of order 2 for 
random variables. 

Let II x be the smallest closed linear subspace of II which contains all elements 
of the form aix(ti) + * * ■ + a n x(t n ). If the covariance function 

r{t y u) = (x (0, x{u)) = E(x(t)x(u)) 

is continuous for all real values of t and u , then x(t) —> x(to) in the mean, as 
t —► to , and we shall say that the process x(t) is continuous. For any continuous 
process, //* is separable. When g{t) is a continuous non-random function of t, 
and x{t) is attached to a continuous stochastic process, the Riemann-Darboux 
sums formally associated with the integral 

[ g(l)x(t ) dt 

are easily shown to tend to a limit y y which is an element of II X , i.e. a random 
variable. By definition, we may identify the integral with this variable y , 
and this integral will possess the essential properties of the ordinary Riemann 
integral (Cram6r, [12]). 

The application of the theory of Hilbert space to stochastic processes seems 
to open very interesting possibilities. Some applications to particular classes 
of stochastic processes will be mentioned below. Futher important results be¬ 
longing to this order of ideas will be given in a work by K. Karhunen [40], which 
is in course of publication. 

18. Relations to ergodic theory. There is a close connection between the 
theory of stochastic processes and ergodic theory. In ergodic theory, as sum¬ 
marized e.g. in the treatise of E. Hopf [38], we consider an arbitrary space 12, 
and a probability measure n, defined for all sets 2 belonging to the additive 



184 


HARALD CRAMER 


class C. We further consider a one-parameter group of one-one transformations 
of ft into itself (a “flow” in ft) such that the transformation corresponding to 
the parameter value t takes the point to = co 0 into w*, while (oj t ) u = co* + „. Let 
f(w) be a given function, defined throughout ft, and such that /(co t ) is C-measur- 
able for every fixed t. The well-known ergodic theorems due to von Neumann, 
Birkhoff, Khintchine and others are then concerned with the asymptotic 
behaviours of mean values, which in the classical cases are of the types 

fM + fM + • ■ • + /(*>„-1) 

n 

or 

jj('/<"■>*• 

as n or T tends to infinity. (In the case of the latter expression, it is necessary 
to introduce some additional condition implying measurability in t.) 

Writing x(t , co) = it is seen that to a given transformation group <o —> to* 

and a given function /(co), there corresponds a stochastic process in the sense of 
Wiener’s definition (cf. 16). The space X 0 of this process consists ( f all functions 
x(t) representable in the form x(t) — when co = co 0 varies over ft. The 

corresponding probability measure P 0 is defined by (16.1). 

Thus any of the above-mentioned ergodic theorems may be expressed as a 
theorem concerning “temporal” mean values of the types 

r(0) + x(l) + • -» + x(n — 1) 
n 
or 

flo x(t)dL 

If, according to some reasonable convergence definition, we may assign a limit 
to either of these expressions, as n or T tends to infinity, this limit will be a 
random variable, and it is important to find conditions which imply that this 
variable has a constant value for “almost all” functions x(t)> i.e. for all x(t) 
except at most a set of /Vmeasure zero. 

In the particular case when x(0), x(l), • • • are independent variables all 
having the same distribution, the classical ergodic theorems yield simple cases 
of the laws of large numbers (cf. 10). The mean ergodic theorem of von Neu¬ 
mann gives the weak law', while the Birkhoff-Khintehine theorem gives the 
strong law. Some more general results belonging to this order of ideas will be 
mentioned in the sequel. 

It will be seen that the two theories are largely equivalent, and it seems 
likely that further comparative studies of the methods will be of great value to 
both sides. 

19. Markoff processes. Consider now a stochastic process, defined by a 
probability measure P(S) in the space X of all real-valued functions x(t) of the 



PROBLEMS IN PROBABILITY THEORY 


185 


real variable t. For any t\ < U ., there is a certain conditional probability 
P(x{t 2 ) C S | x(ti) = a{) of the relation x(t 2 ) C S, relative to the hypothesis 
that x(h) assumes the given value ai. Suppose now that this conditional prob¬ 
ability is independent of any additional hypothesis concerning the behaviour of 
x(t) for t < h , so that we have e.g. for any to < k < U and for any ao 

P(x(t») C S | x(h) = ai) = P(x(ti) C S | x(h) = a \, x(U) = oo). 

In this case the process is called a Markoff process. 

The general theory of this type of processes, which forms a natural gen¬ 
eralization of the classical concept of Markoff chains, has been studied in basic 
works by Kolmogoroff [53] and Feller [26], [28]. Writing 

P(x(t) ^ {|t(Ai) = Oo) = F((; t y fl 0 , to)y 

where to < t, F will be the distribution function of the random variable x(t) y 
relative to the hypothesis x(t 0 ) = a Q . Then F satisfies the Chapman-Kol- 
mogoroff equation 

(19.1) F(£; a 0 , £o) = f F(£; f, rj, ti) dnF(rj; h , Oo , ^o), 

J—oo 

whi<;h expresses that, starting from the state x(/ 0 ) = do , the state x{t) % ( 
must be reached by passing through some intermediate state x(h) = r/, where 
to < t\ < t. Subject to certain general conditions, it is possible to show that 
any solution of this equation satisfies certain integro-differential equations, 
which in some important cases reduce to partial differential equations of para¬ 
bolic type, and that the d.f. F is uniquely determined by these equations. How¬ 
ever, the general conditions mentioned above are in many cases difficult to apply 
to particular classes of processes, and it would be important to have further 
investigations concerning these questions. 

Markoff processes (not belonging to the subclass of differential processes, 
which will be considered in the following paragraph) appear in several important 
applications, e.g. in the theory of cosmic radiation, in certain genetical problems, 
in the theory of insurance risk etc. In these cases, we are often concerned with 
the class of purely discontinuous Markoff processes, where the function x(t) 
only changes its value by jumps. If, in addition, there are only a finite or 
enumerable set of possible values for x(f), the Chapman-Kolmogorcff equation 

(19.1) reduces to 

(19.2) rrik(to , t) = ^ TiXto , ti)irjk(ti , t ), 

i 

where t,ji(<o , t) denotes the ‘‘transition probability”, i.e. the probability that 
x{t) will be in the /cth state at the time t, when it is known to have been in the 
t'th state at the time to . In matrix form, this equation may be written 

(19.3) U(t Q , t) = n (to , hMh , t), 
where n denotes the matrix of the th . 



186 


HARALD CRAMER 


When only a sequence of discrete values of t are considered, we have here 
the classical case of Markoff chains, which has received a detailed treatment 
in the well-known book by Fr^chet [32] (cf. also Doob, [19]). The case when t 
is a continuous variable has been treated by Feller [28], 0. Lundberg [71], 
Arley [ 1 ], and other authors. Some of the most important problems of this 
branch of the subject are concerned with the existence of a unique system of 
solutions of (19.2) or (19.3), and with the asymptotic behaviour of the solu¬ 
tions for large values of t — / 0 . Though important results have been reached, 
there still remains much to be done here, and the same thing holds a fortiori 
with respect to the analogous problems for general Markoff processes. 

20. Differential processes. A particularly interesting case of a Markoff 
process arises when, for any At > 0, the increment Ax(/) = x(t + At) — x(t) 
is independent of .r(r) for r ^ /. The process is then called a differential process. 
Some of the earliest studied stochastic processes belong to this class, which 
contains in particular the two examples discussed above in 15. Further cases 
of such processes arise e.g. in the theory of radioactive disintegration and in 
telephone technique. 

Let us suppose that a*(0) is identically equal to zero, and that the process is 
uniformly continuous in probability in every finite interval 0 ^ t ^ T, i.e. 
that for any fixed positive e 

P( | x(t -f At) - x{t) | > e) 0 

as At —► 0, uniformly for 0 ^ t ^ T Then it follows from the works of L 6 vy, 
[60], [63], Khintchine [47] and Kolmogoroff [54] that, for any t > 0 , the random 
variable x(t) has an infinitely divisible distribution, with a characteristic func¬ 
tion y?(z; t) given by (9.2), where /?, 7, M(u) and N(u) may depend on t. 

In the particularly important case when the distiibution of the increment 
x{t + AO = x(t) does not involve l, but only depends on the length At of the 
interval, w T e say that the process is tempo?ally homogeneous , and in this case 
we have 


log <p(z\ t) = t log <p(z; 1 ), 

so that we obtain the general formula for <p{z\ t) simply by replacing in ( 9 . 2 ) 
£, 7 , M(u) and N(u) by tfi } t*y, tM(u) and tN{u) respectively. 

When t —♦ 00 , or / —» 0, the appropriately normalized distribution of x(t) 
tends, under certain conditions, to a stable distribution (Cramer [7], Gne¬ 
denko [36]). When this limiting distribution is normal, there are sometimes 
even asymptotic expansions analogous to (11.3). Still, the problem of the 
asymptotic behaviour of the distribution for large t does not seem to be definitely 
cleared up. 

Khintchine [41] and Gnedenko [37] have given interesting generalizations 
of the law of the iterated logarithm (cf. 12 ) to processes of the type considered 
here. 



PROBLEMS IN PROBABILITY THEORY 


187 


The continuous process discussed in 15 in connection with the Brownian 
movement corresponds to the temporally homogeneous case when £, M(u) and 
N(u) all reduce to zero, so that 

<p{z) = e" T “’, 

which shows that the distribution of x(t) is normal, with mean zero and vari¬ 
ance 2yt. 

On the other hand, in the applications to the theory of insurance risk, y is 
zero, while M(u) and N(v) are connected with the distribution of the various 
magnitudes of claims. In this type of applications, it is often very important 
to find the probability that x(t) satisfies an inequality of the form 

x(t) < a + bt 

for all values of t. It follows from the discussion in 16 that the definition of 
a probability of this type is somewhat delicate. The problem, which can be 
regarded as an extended form of the classical problem of “the gambler’s ruin,” 
has been solved in certain particular cases. It leads to integral equations, 
which in the simplest case are of the Volterra, in other cases of the Wiener- 
Hopf type (Cramer [G], [13], Segerdahl [79], Tacklind [81]). 

21. Orthogonal processes. Consider now the case of a complex-valued 
x(t), and suppose that E | x (/) | 2 is finite for all t. Without restricting the gen¬ 
erality, we may assume that Ex(t) = 0 for all t. 

Suppose now that instead of requiring, as in the case of a differential process, 
that the variables x(t) and Ax(0 should be independent when t ^ we only 
lay down the less stringent condition that these variables should be non-cor- 
relatcd, i.e. that 

E(x(t) Ax(t)) = 0. 

We then obtain a process which is no longer necessarily of the Markoff type. 
The condition implies that, for any two disjoint intervals (h , t 2 ) and (fo, U), 
we have 

E[(x(t 2 ) - x(tj))(x(u) - z&))] = 0, 

so that the “chords” corresponding to two disjoint “arcs” of the curve in 
Hilbert space representing the process are always orthogonal (Kolmcgoroff 
[56], [57]). A process of this type may accordingly be called an orthogonal 
process. 

For a process of this type we have, writing E | x(t) | 2 = F(t), F(t + At) — 
F(t) = E | x(t + At) — z(t) | 2 , so that F(t) is a never decreasing function of t. 
If F(t) is bounded for all t , we shall say that the orthogonal process is bounded. 
For a bounded orthogonal process, the Stieltjes integral 

[ g(t) dx(t ), 



188 


HARALD CRAMER 


where g(t) is bounded and continuous, may be defined as the limit in the mean 
of sums of the form 

£ g(t,)(x(t,) - x(U-i)). 

p 

22. Stationary processes. When we are concerned with a process representing 
the temporal development of a system governed by laws which are invariant 
under a translation in time, it seems natural to assume that the joint distri¬ 
bution of any group of variables of the form 

(22.1) x(h + r), ••• , x(t n + r) 

is independent of r. A process satisfying this condition will be called a sta - 
tivnary process. If a stochastic process is defined by means of a ‘‘flow” a> —> 
in a space 12 (cf. 18), the process will be stationary when and only when the 
corresponding flow is measure-preserving , i.e. if the transformation a> —* oj t 
changes any C-measurable set S into a set S t of the same measure. 

Under appropriate conditions with respect to the measurability of x(t ), the 
Birkhoff-Khintchine ergodic theorem holds for a stationary process, i.e. there 
exists a random variable y such that we have 

(22.2) P 0 ^lim ^ jf x(t) dt = yj = 1, 

T—*oo 

where P Q is the probability measure in a suitably restricted space in the sense 
of Doob. Further work seems to be required here, in order to make the situa¬ 
tion quite clear, also with regard to metric transitivity. 

For a stationary process, any finite moment of the joint distribution of the 
variables (22.1) is obviously independent of r. Suppose now that we only re¬ 
quire that this invariance under translations in time should hold for moments 
of the first and second order of the joint distributions, which are assumed to 
be finite. The wider class of processes obtained in this way may be called 
stationary of the second order. Processes of this type have been studied for the 
first time by Khintchine [42]. We shall assume that x(t) is complex-valued. 
Without restricting the generality, we may further assume that Ex{t) = 0 for 
all t. The product moment E(x(t)x(u)) will then be a function of the difference 
t — u: 

(22.3) E(x(t)x(uj = R(t - u). 

Assuming, in addition, that R(t) is continuous at t = 0, it follows that R(t) 
is continuous for all t , and the process is continuous in the sense of 17. It was 
shown by Khintchine that a NS condition that a given function R(t) should 
be associated with a second order stationary and continuous process by means of 
the relation (22.3) is that we should have 

R( t ) = r e «*dF(x) 

J—00 


( 22 . 4 ) 



PROBLEMS IN PROBABILITY THEORY 


189 


for all tj where the spectral function F(x) is real, never decreasing and bounded. 
In particular, we have 

F(+ oc) - F (-*) = 77(0) = E I x(t) I 2 = a 1 . 

Khintchine’s condition for /?(/) was generalized by Cramer to the case of an 
arbitrary number of processes a*i(/), • • • , x n (i), such that the product moments 
E(xi(t)xj(u)) are functions of the difference t — u. The corresponding spectral 
functions Fij(x) are in general complex-valued and of bounded variation. Fur¬ 
ther, the expression (Cramer, [12]) 

n 

2 ZiijLFi,, 

where A F tj = Fij(b) — Fij(a) is, for any a < b, a non-negative Hermite form in 
the variables 2 ». This result is closely connected with a theorem on Hilbert 
space considered by Kolmogoroff and Julia. It is further shown that, to any 
given functions Fij(x), (t, j — 1, • • • , n), satisfying these conditions, we can 
always find n processes Xi (/), • ■ • , x n (t) such that the joint distribution of any set 
of variables Xi(t 3 ) is always normal , while the covariance functions R i3 (t — u) = 
E(x t (l)xj(u)) are given by the expression 

Rii(t) = dF,j(x). 

For a process x{t) which is continuous and stationary of the second order* 
with Ex(t) = 0 for all /, we have the mean ergodic theorem 

(22.5) l-'-m. J?p f f _x, ‘ x(l) dt = y 

T-* oo XI J—T 

for any real X. The random variable y has the mean 0 and the variance F(\ + 0) 
— F(X — 0), where F is the spectral function appearing in (22.4). If X is a 
point of continuity for F, it thus follows that y = 0 with a probability equal 
to 1. On the other hand, if X is a discontinuity, y has a positive variance. Let 
Xi, X 2 , • • • be all the discontinuities of F(x), and let o \, a \, • • * be the cor¬ 
responding saltuses, while yi , y 2 , • • • are the limits in the mean obtained from 

(22.5) for X = Xi , X 2 , ••*. Then two different y 3 are always orthogonal: 

EiViVk ) = 0 for j k , and we have 

(22.6) x(t) = £ y,e Ki ‘ + *(0, 

v 

where E£(t) = 0 and 

E\i(t) | s = a s -Z4. 

V 

If Fix) is a step-function, we have a- 2 = a ), and it follows that £(t) = 0 
with a probability equal to 1, so that (22.6) gives a “stochastic Fourier expan¬ 
sion” of x(t) (Slutsky, [80]). 



190 


HARALD CRAMER 


Even wlien F(x) is arbitrary, we can obtain a spectral representation of x(t) 
generalizing (22.6). In fact, it can be shown (Cramer, [14]) that x{t) can always 
be represented by a Fourier-Stieltjes integral 

(22.7) *(/) = r< u “dz(u), 

J— 00 


where z(u) is a random function attached to a bounded orthogonal process 
(cf. 21), such that 

E | z(u + A u) — z(u) | 2 = F(u + A u) — F(ti). 


Conversely, we have 


/ oo —tt(u+Au) ■***♦ t u 

- - x(t) (it , 


so that there is a one-one correspondence between .r(/) and Sz(u). The integrals 
(22.7) and (22.8) are defined as limits in the mean, as shown above in 17 and 21. 
These results are in close correspondence with generalized harmonic analysis for 
an arbitrary function, as developed by Wiener [831 and Bochner [4]. The spec¬ 
tral representation of a stochastic process has important applications, some of 
which will be considered in a forthcoming paper by Karhunen [40]. An exten¬ 
sion of the spectral representation to a more general class of piocesses has been 
given by Loeve [68]. 

When, in particular, the x(t) process is such that the joint distribution of any 
group of variables .r(6), • • • , x(t n ) is normal, it follows that any increment 
A z(u) is normally distributed. Since two uncorrelated normally distributed 
variables are always independent, it follow's that in this case the z(u) process 
is a differential process with normally distributed increments. Important 
results for this case have recently been given by Doob |22]. 

The properties of continuity, differentiability etc. for processes of the type 
here considered are still incompletely known, and further work is required. 
A further group of important unsolved problems are connected with an inter¬ 
esting decomposition theorem by Wold [84], which holds for processes with 
a discrete time variable. The generalization of this theorem to the continuous 
case does not seem to have so far been given in a final form. 


REFERENCES 

( 1] N. Arlky, “On the theory of stochastic processes and their applications to the theory 
of cosmic radiation,” Thesis, Copenhagen, 1943. 

[21 G. M. Bvwly, “Ueber cine Verallgemeinerung der Grenzwertsatze der Wahrschein- 
lichkeitsrechnung,” Rec. Math. (Mat. Sbornik), N. S., Vol. 1 (1936), pp. 917-929. 

[3] 8. N. Bernstein, “Sur rextension du th^ordme limite du calcul des probability aux 

Bommes de quantity d<5pendantes,” Math. Ann., Vol. 97 (1927), pp. 1-59. 

[4] S. Bochner, “Monotone Funktionen, Stieltjessche Integrale und harmonische Ana¬ 

lyse,” Math. Ann., Vol. 108 (1933), pp. 378-410. 

[5] H. Crvm^r, “On the composition of elementary errors,” Skand. Aktuarietidskr ., 

Vol. 11 (1928), pp. 13-74, 141-180. 



PROBLEMS IN PROBABILITY THEORY 


191 


[6] -, “On the mathematical theory of risk.” Published by the Insurance Com¬ 

pany Skandia, Stockholm, 1930. 

[7] -, “Sur les propriety asymptotiques d’une classc dc variables alytoires,” 

C. R. Acad. Sd. Pans , Vol. 201 (1935), pp. 441-443. 

[8] -, “Ueber cine Eigensehaft der normalen Verteilungsfunktion,” Math. Zeit., 

Vol. 41 (1936), pp. 405-414. 

[9] -, Random variables and piobabilitij dislnbutwns , Cambridge Tracts in Math., 

Cambridge, 1937. 

[10] -,“Sur un nouveau theoremc—limitc de la thdoric des probability,” Actual¬ 

ity Scientijiqucs, Paris, No. 736 (1938), pp. 5-23. 

[11] -, “On the representation of a function by certain Fourier integrals,” Trans. 

Amer. Math. Soc., Vol. 46 (1939), pp. 191 201. 

[12] -, “On the theory of stationary random processes,” Ann. of Math., Vol. 41 

(1940), pp. 215-230. 

[13] --, “Deux conferences sur la theoric des probability,” Skand. Aktuarielidskr., 

1941, pp. 34-69. 

[14] - -, “On harmonic analysis in certain functional spaces,” Ark. Mat. Astr. Fys., 

Vol. 28B (1942), pp. 1 7. 

[15] -, Mathematical Methods of Statistics. Princeton Univ. Press, Princeton, 1946. 

[16] W. Doebijn, “Premiers <*l<5inents d’uno <*tude systdinatique dc l’ensemblc de puis¬ 

sances d’une loi de probability,” C. R. Acad. Sci. Paris , Vol. 206 (1938), pp. 
306-308. 

[17] -, “Sur un theoreme du ealeul des probability,” C. R. Acad. Set. Paris , Vol. 

209 (1939), pp. 712 743 

[18] J. L. Doob, “Stochastic processes depending on a continuous parameter,” Trans. 

Amer. Math. Soc., Vol. 42 (1937), pp. 107 140. 

[19] -, “Stochastic processes with an integral-valued parameter,” Trans. Amer. 

Math. Soc., Vol. 14 (1938), pp. 87-150. 

[20] -, “Regularity properties of certain families of chance variables,” Trans. 

Atnei. Math. Soc., Vol. 17 (1940), pp. 455-486. 

[21] -, “The law of large numbers for continuous stochastic processes,” Duke Math. 

Jour., Vol. 6 (1940), pp. 290-306. 

[22] -—, “The elementary Gaussian processes,” Annals of Math. Stat., Vol. 15 (1944), 

pp. 229-282. 

[23] J. L. Doob and M. Ambrose, “On the two formulations of the theory of stochastic ‘ 

processes depending upon a continuous parameter,” Ann. of Math., Vol. 41 
(1940), pp. 737-745. 

[24] P. Erdos, “On the law of the iterated logarithm,” Ann. of Math., Vol. 43 (1942), 

pp. 419-436. 

[25] W. Feller, “Ueber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung,” 

Math. Zeit., Vol. 40 (1935), pp. 521 559. 

[26] -, “Zur Theorie der stochastischen Proccsse (Existcnz- und Eindeutigkeits- 

satze),” Math. Ann., Vol. 113 (1936), pp. 113-160. 

[27] -, “Ueber das Geselz der Grossen Zahlen,” Acta. Univ. Szeged., Vol. 8 (1937), 

pp. 191-201. 

[28] -, “On the integro-differential equations of purely discontinuous Markoff pro¬ 

cesses,” Trans. Amer. Math. Soc., Vol. 48 (1940), pp. 488^575. 

[29] -, “Generalization of a probability limit theorem of Cramdr,” Trans. Amer. 

Math. Soc., Vol. 54 (1943), pp. 361-372. 

[30] -, “The general form of the so-called law of the iterated logarithm,” Trans. 

Amer. Math. Soc., Vol. 54 (1943), pp. 373-402. 

[31] -, “The fundamental limit theorems in probability,” Bull. Amer. Math. Soc., 

Vol. 51 (1945), pp. 800-832. 



192 


HARALD CRAMER 


[32] M. Fr£chet, Recherche a thtonques modernea sui la thkone des probabilitys, Vol. 2, 

Paris, 1937. 

[33] B. V. Gnedenko, “Sur les fonctions caract^ristiques,” Bull. Math. Umv. Moscou, 

Vol. 1, (1937), pp. 16-17. 

[34] -, “On the theory of the domains of attraction of stable laws/' Ubenye Zpiski, 

Moskovskoga gosudaistvenogo Univaziteta , Vol. 30 (1939), pp. 61-81. 

[35] -, “On the theory of limit theorems for sums of independent random variables,” 

Bull. Acad. Sci. URSS, Vol. 30 (1939), pp. 181-232, 643-647. 

[ 30 ]- f “On locally stable probability distributions/’ C. R . Acad. Set. URSS , Vol. 35 

(1942), pp. 263-266. 

[ 37 ] -, “Investigation on the growth of homogeneous random processes,” C. R. 

Acad. Sci. URSS, Vol. 36 (1942), pp. 3-41. 

[38] E. IIopf, Ergodentheorie, Ergebnisse der Mathematik, Vol. 5, No. 2 , Berlin, 1937. 

[39] P. L. Hsu, “The approximate distribution of the mean and of the variance of inde¬ 

pendent variates,” Annals of Math. Stat., Vol. 16 (1945), pp. 1-29. 

[40] K. Karhunen, Paj>er on stochastic processes, to appear in the Acta Soc. Sci. Fennicae. 

[41] A. Khintchine. Asympiotische Gesetze der Wahrseheinlichkeitsrechnung, Ergebnisse 


der Mathematik, Vol. 2, No. 4, Berlin, 1933. 

[42] -, “Korrelationstheorie der stationiiren stochastischen Prozesse,” Math. Ann., 

Vol. 109 (1934), pp. 604-615. 

[43] -, “Sul dominio di attrazionc della legge di Gauss,” Gioin. 1st. Ital. Attuari, 

Vol. 6 (1935), pp. 378-393. 

[ 44 ] -, “Su una legge dei grandi numeri generalizzata,” Gwm. 1st. Ital. Attuari , 

Vol. 7 (1936), pp. 365-377. 

[45] -, “Zur Kennzeichnung der characteristisehen Funklionen,” Bull. Math. 

Umv. Moscou , Vol. 1 (1937), pp. 1-31 

[46] -, “Contribution & l’arithmetique des lois de distribution,” Bull. Math. Univ. 

Moscou, Vol. 1 (1937), pp. 6-17. 

[47] -, “Zur Theorie der unbesohrankt teilbaren Verteilungsgesetze,” Rec. Math. 


N. S., Vol. 2 (1937), pp. 79-117. 

[48] A. Khintchine and A. Kodmogoroff, “ Ueber Konvergenz von Reihen, deren Glieder 

durch den Zufall bestimmt werden,” Rec. Math., Vol. 32 (1925), pp. 668-677 . 

[49] A. Khintchine and P. L£vy, “Sur les lois stables,” C R . Acad. Set. Pans, Vol. 202 

(1936), pp. 374-376. 

[50] A. Kolmogoroff, “Ueber die Sumnicn durch den Zufall bestimmter unabluingiger 

Grossen,” Math. Ann., Vol. 99 (1928) and Vol. 102 (1929), pp. 484-489. 

[51] -, “Ueber das Gesetz des iterierten Logarithmus,” Math. Ann., Vol. 101 (1929), 

pp. 126-135. 

[52] -, “Sur la loi forte des grands nombres,” C. R. Acad. Sci. Pans, Vol. 191 (1930), 

pp. 910-911. 

[53] -, “Ueber die analytischen Methodcn der Wahrseheinlichkeitsrechnung,” 

Math. Ann., Vol. 104 (1931), pp. 415-458. 

[ 54 ] - } “Sulla forma gencrale di un processo stocastieo omogenco,” Alti. Acad. naz. 

Lincei, Rend., Vol. 15 (1932), pp. 805-828 , 866-869. 

[55] -, Grundbegriffe der Wahrseheinlichkeitsrechnung, Ergebnisse der Mathematik, 

Vol. 2 , No. 3, Berlin, 1933. 

[ 50 ]-, “Wiener Spiralen und cinige andere interessantc Kurven im Hilbertschen 

Raum,” C. R. Acad. Sci. URSS, Vol. 26 (1940), pp. 115-118. 

[57] -, “Kurven im Hilbertschen Raum, die gegentiber einer ein parametrigen Gruppe 

von Bewegungen invariant sind,” C. R. Acad. Sci. URSS, Vol. 26 (1940), pp. 6 - 9 . 

[58] P. LAvy, Calcul des probability, Paris, 1925. 

[50]-,“Sur les series dont les termes sont des variables 6 ventuelles ind 6 pendantes,” 

Studia Math., Vol. 3 (1931), pp. 117-155. 



PROBLEMS IN PROBABILITY THEORY 


193 


[60] -, “Sur les integrates dont les elements sont des variables aie&toires indepen- 

dantes,” Ann. Scuola norm, super. Pisa, (8), Vol. 3 (1934), pp. 337-366. 

[61] -, “Proprietes asymptotiques des sommes de variables aleatoires indepen- 

dan tea,’' Ann. Scuola norm, super. Pisa (8), Vol. 3 (1934), pp. 347-402. 

[62] -, “La loi forte des grands nombres pour les variables aleatoires enchahtees,” 

Jour. Math. Pures Appl., Vol. 15 (1936), pp. 11-24. 

[63] -, Thkorie de V addition des variables aUatoires , Paris, 1937. 

[ 64 ] -, “L'arithmetique des lois de probability,” Jour. Math. Pures Appl., Vol. 17 

(1938), pp. 17-39. 

[65] A. M. Liapounoff, “Nouvelle forme du tlteordme sur la limite de la probability,” 

Mkmoires Acad. Saint-Petersbourg s. 8, Vol. 12 (1901). 

[66] J* W. Lindeberg, “Eine neue Herleitung des Exponentialgesetzes in der Wahrschein- 

lichkeitsrechnung,” Math. Zeft ., Vol. 15 (1922), pp. 211-225. 

[67] W. LoAve, “fitude asymptotique des sommes de variables aleatoires ltees,” Jour. 

Math. Puree Appl ., Vol. (9) 24 (1945), pp. 249-318. 

[68] --, “Analyse harmonique generate d’une fonction ateatoire,” C. R. Acad. Sci. 

Paris , Vol. 220 (1945), pp. 380-382. 

[69] F. Lundberg, “Zur Theorie der Riickversicherung,” Verhandlungen Kongr. f. Ver • 

sicherungsmathematik , Wien, 1909. 

[70] -, Fdrsdkringsteknisk riskutjdmning , Stockholm, 1927. 

[71] O. Lundberg, “On random processes and their application to sickness and accident 

statistics,” Thesis, Stockholm, 1940. 

[72] J. Marcinkiewicz, “Sur une propriete de la loi de Gauss,” Math. Zeit., Vol. 4 (1938), 

pp. 612-618. 

[73] J. Marcinkiewicz and A. Zygmund, “Sur les fonctions indSpendantes,” Fund. Math., 

Vol. 29 (1937), pp. 60-90. 

[74] R. E. A. C. Paley and N. Wiener, Fourier Transforms in the Complex Domain, Amer. 

Math. Soc. Colloquium Publ., Vol. 19, New York, 1934. 

[75] D. Raikov, “On the composition of Poisson laws,” C. R. Acad. Sci. URSS , Vol. 14 

(1937), pp. 9-11. 

[76] -, “On the decomposition of Gauss and Poisson laws," Bull. Acad. Sci. URSS, 

Ser. Math., 1938, pp. 91-120. 

[77] -, “On the composition of analytic distribution functions,” C. R. Acad. Sci. 

URSS , Vol. 23 (1939), pp. 511-514. 

[78] I. J. Schoenberg, “Metric spaces and positive definite functions,” Trans. Amer. 

Math. Soc., Vol. 44 (1938), pp. 522-536. 

[79] C. O. Segerdahl, “On homogeneous random processes and collective risk theory,” 

Thesis, Stockholm, 1939. 

[80] E. Slutsky, “Sur les fonctions ateatoires presque pdriodiques et sur la decomposition 

des fonctions ateatoires stationaires on composantes,” Actualitts Scientifiques , 
No. 738, Paris, 1938, pp. 33-55. 

[81] S. Tacklind,“ Sur le risque de ruine dans des jeux in^quitables,” Sknad. Aktuarietidskr., 

1942, pp. 1-42. 

[82] N. Wiener, “Differential space,” Jour. Math. Phys. M. I. T., Vol. 2 (1923), pp. 131- 

172. 

[83] -, “Generalized harmonic analysis,” Acta Math., Vol. 55 (1930), pp. 117-258. 

[84] H. Wold, “A study in the analysis of stationary time series,” Thesis, Stockholm, 1938. 



THE ESTIMATION OF DISPERSION FROM DIFFERENCES 1 

By Anthony P. Morse 2 and Frank E. Grubbs 
Ballistic Research Laboratory , Aberdeen Proving Ground , Maryland 

S ummar y. The estimation of variance by use of successive differences of 
higher order is discussed in this paper. Heretofore, attention has been focused, 
in published works, on estimates of variance obtained by employing the sum of 
squares of deviations from the mean and also by using mean square successive 
differences of the first order [1], [2], [3], [9]. A concise description of the method 
employing differences of any order with appropriate formulae for the precision 
of estimates so obtained and also a practical example on the use of the technique 
are given in section 11. Fundamental contributions to the estimation of 
variance from higher order differences, a study of the efficiency of the technique 
and proper orientation of the subject matter in the field of mathematical statis¬ 
tics are given in sections 2-10 of the paper. 

1. Introduction. It frequently happens that successive observations, made 
at regular intervals of time, are subject to the same standard error while the 
means of the populations from which they are drawn display some kind of trend. 
The type of trend we speak of is brought about because of the manner in which 
we have to take measurements or because of variations in the measuring tech¬ 
nique itself; or, again, the trend may be characteristic of the thing we are meas¬ 
uring. In any event, we may desire to eliminate the trend in order to study 
residual effects. As an example, it is desirable in the field of ballistics to evaluate 
the dispersion of machine guns firing from a moving airplane. 

It may also happen that it is either inexpedient or impossible to estimate the 
standard error of the observations by the method of least squares, for in a large 
number of cases the type of trend is unknown. In this event a method employing 
differences of an appropriate order may prove valuable. The method consists 
merely of arranging the data in a vertical column in the order in which the obser¬ 
vations were taken and then forming difference columns in the usual way of 
order 1,2, up to say 5 or some other number depending on the peculiarities of the 
problem at hand and the number of the original observations. Next, sum the 
squares of the numbers in each column and divide the sum of squares of the pth 

order differences by (n — p) • When n> 2 and p > 1, the numbers thus 

arrived at are all unbiased estimates of the population variance a for the case 
where all the observations have the same expected value. In section 11 at the 

^his paper is based substantially on a Ballistic Research Laboratory Report [10] 
of the same subject by Morse and has been prepared for publication by Grubbs at the sug¬ 
gestion of R. H. Kent. The authors are grateful to J. V. Lewis and H. L. Meyer for their 
many and varied comments, criticisms and suggestions. 

* Now at the University of California, Berkeley, California. 

194 



ESTIMATION OP DISPERSION 


195 


end of the paper will be found a summary of this method, formulas by which 
the precision of the estimate of the variance <r 2 may be determined, and an exam¬ 
ple displaying the stability of this estimate with respect to p. 

If a strong trend is present then the method of first differences will obviously 
yield an estimate of variance which is fictitiously large and the temptation to 
pass to higher order differences may quite reasonably be yielded to. As a matter 
of fact, unbiased estimates may be hoped for from pth order differences whenever 
there is good reason to suppose that the pth derivative of the trend function is 
small most of the time. However, even in the case of a sinusoidal trend where 
all derivatives have the same magnitude one may obtain good results frcm higher 
differences provided there are at least seven observations in each interval of 
length one period (see section 5 and Table II below). In connection with trends 
such as the sinusoidal type, the hopelessness of getting, say, even a fifth degree 
polynomial to fit over an interval of, say 20 periods is rather evident. It is 
for the above reasons that estimation of variance from higher order differences 
deserves consideration. 

2. Historical comment. A brief historical development of the interest in 
successive differences as a means for estimating dispersion is given in [3]. This 
paper discusses the statistic 



suggested by 4 ‘Student** [W. S. Gossett] and E. S. Pearson and points out the 
relevant work of Jordan, Helmert, Vallier, Cranz, and Becker. It seems that 
Jordan devised methods based on sums of powers of the differences, whereas 
Helmert gave more careful consideration to the case of the first power, i.e. the 
sum of absolute differences. Reference [3] points out, however, that in these 
two cases ail the n(n — 1) /2 differences that can be established frcm a sample of 
n observations were included in the estimates of dispersion recommended by 
Jordan and Helmert, so that the estimate was of no value in reducing the effect 
of a trend. Continuing the remarks of [3], we learn that in ballistics Vallier 
appears to have been the first to estimate dispersion from successive differences 
and that Cranz and Becker commended the mean successive difference 


Ei = s i *<+i - i 

n — 1 

in estimating dispersion in range of guns since they were aware of variable ex¬ 
ternal effects (such as tail winds) on a projectile. In this country, Bennett [1] 
appears to have suggested the use of successive differences independently of 
European ballisticians. In this connection, Bennett suggested that the probable 



196 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


TABLE 1 


The Efficiency, W(n, p), of As An Estimate of <r* 


/ 

l 

2 

3 

4 

5 

6 

7 

8 

. 

9 

10 

2 

1.00000 










3 

.80000 

.50000 









4 

.75000 

.46154 

.33333 








5 

.72727 

.46552 

.32000 

.25000 







6 

.71429 

.47213 

.33149 

.24427 

.20000 






7 

.70588 

.47771 

.34453 

.25510 

.19672 

.16667 





8 

.70000 

.48214 

.35537 

.26871 

.20633 

.16471 

.14286 




9 

.69565 

.48568 

.36408 

.28071 

.21888 

.17274 

.14159 

.12500 



10 

.69231 

.48855 

.37113 

.29071 

.23058 

.18385 

.14830 

.12414 

.11111 


11 

.68966 

.49091 

.37691 

.29904 

.24070 

.19476 

.15802 

.12978 

.11050 

.10000 

12 

.68750 

.49288 

.38173 

.30602 

.24934 

.20450 

.16798 

.13827 

.11529 

.09955 

13 

.68571 

.49455 

.38580 

.31194 

.25672 

.21300 1 

.17714 

.14729 

12271 

.10366 

14 

.68421 

.49598 

.38928 

.31701 

.26308 

.22039 

.18530 

.15581 

.13086 

.11018 

15 

.68293 

.49722 

.39228 

.32139 

.26859 

.22684 

.19250 

.16353 

.13874 

.11754 

16 

.68182 

.49831 

.39490 

.32522 

.27342 

.23251 

.19887 

.17045 

.14601 

.12481 

17 

.68085 

.49926 

.39721 

.32859 

.27767 

.23752 

.20452 

.17664 

.15260 

.13162 

18 

.68000 

.50011 

.39925 

.33158 

.28145 

.24197 

.20956 

18218 

.15855 

.13787 

19 

.67925 

.50087 

.40107 

.33424 

.28482 

.24595 

.21407 

.18715 

.16393 

.14356 

20 

.67857 

.50155 

.40271 

.33663 

.28784 

.24953 

.21813 

.19164 

.16879 

.14875 

21 

.67797 

.50216 

.40419 

.33880 

.29058 

.25276 

.22181 

.19571 

L 17321 

.15347 

22 

.67742 

.50272 

.40553 

.34075 

.29306 

.25569 

.22515 

.19941 

.17723 

.15778 

23 

.67692 

.50323 

.40675 

.34254 

.29532 

.25837 

.22819 

.20279 

.18091 

.16173 

24 

.67647 

.50370 

.40787 

.34417 

.29739 

.26082 

.23098 

.20588 

.18428 

.16535 

25 

.67606 

.50413 

.40889 

.34567 

.29929 

.26307 

.23354 

.20873 

.18738 

.16869 

26 

.67568 

.50452 

.40984 

.34706 

.30104 

.26514 

.23590 

.21135 

.19024 

.17177 

27 

.67533 

.50489 

.41071 

.34833 

.30266 

.26705 

.23809 

.21378 

.19289 

.17463 

28 

.67500 

.50523 

.41152 

.34951 

.30416 

.26884 

.24012 

.21603 

.19535 

.17728 

29 

.67470 

.50555 

.41228 

.35062 

.30555 

.27049 

.24200 

21812 

.19764 

.17975 

30 

.67442 

.50585 

. 11298 

.35165 

.30686 

.27203 

.24375 

.22007 

.19978 

.18205 

31 

.67416 

.50612 

.41363 

.35260 

.30807 

.27347 

.24539 

.22190 

.20177 

.18420 

32 

.67391 

.50638 

.41425 

.35350 

.30921 

.27482 

.24693 

.22361 

.20364 

.18622 

33 

.67368 

.50662 

.41482 

.35434 

.31027 

.27608 

.24837 

.22521 

.20539 

.18811 

34 

.67347 

.50685 

.41536 

.35513 

.31128 

.27727 

.24973 

.22672 

.20704 

.18989 

35 

.67327 

.50707 

.41587 

.35588 

.31222 

.27839 

.25101 

.22814 

.20859 

.19157 

36 

.67308 

.50727 

.41635 

.35658 

.31312 

.27945 

.25221 

.22949 

.21006 

.19315 

37 

.67290 

.50746 

.41671 

.35724 

.31396 

.28045 

.25335 

.23075 

.21145 

.19465 

38 

.67273 

.56764 

.41724 

.35787 

.31476 

.28140 

.25443 

.23195 

.21276 

.19606 

39 

.67257 

.50781 

.41765 

.35847 

.31551 

.28229 

.25545 

.23309 

.21401 

.19741 

40 

.67241 

.50797 

.41804 

.35904 

.31623 

.28314 

.25642 

.23417 

.21519 

.19868 



ESTIMATION OF DISPERSION 


197 


TABLE I —Continued 


\p 
n \ 

\ 

1 

2 

3 

1 

4 

5 

. 

6 

7 

8 

9 

10 

42 

.67213 

.50828 

.41875 

.36009 

.31756 

.28472 

.25822 

.23617 

.21738 

.20105 

44 

.67188 

.50855 

.41941 

.36104 

.31877 

.28615 

.25986 

23799 

.21937 

.20320 

46 

.67164 

.50880 

.42000 

.36191 

.31987 

.28745 

.26135 

.23965 

.22118 

.20516 

48 

.67143 

.50903 

.42055 

.36271 

.32088 

.28865 

.26271 

.24117 

.22284 

.20695 

60 

..67123 

.50925 

.42105 

.36343 

.32180 

.28975 

.26397 

.24256 

.22437 

.20860 

52 

.67105 

.50944 

.42151 

.36411 

.32266 

.29076 

.26512 

.24385 

.22578 

.21012 

54 

.67089 

.50962 

.42193 

.36473 

.32345 

.29170 

.26619 

.24504 

.22708 

.21153 

56 

.67073 

.50979 

.42233 

.36531 

.32418 

.29257 

.26718 

.24614 

.22829 

.21284 

58 

.67059 

.50995 

.42270 

.36585 

.32487 

.29338 

.26811 

.24717 

.22941 

.21405 

62 

.67033 

.51022 

.42337 

.36682 

.32609 

.29484 

.26977 

.24903 

.23144 

.21624 

66 

.67010 

.51048 

.42395 

.36767 

.32718 

.29612 

.27123 

.25065 

.23322 

.21817 

70 

.66990 

.51069 

.42447 

.36843 

.32813 

.29725 

.27252 

.25209 

.23479 

.21987 

74 

.66972 

.51089 

.42492 

.36910 

.32898 

.29826 

.27368 

.25237 

.23619 

.22138 

78 

.66957 

.51107 

.42534 

.36970 

.32975 

.29917 

.27471 

.25452 

.23745 

.22274 

82 

.66942 

.51122 

.42571 

.37024 

.33043 

29998 

.27564 

.25556 

.23859 

.22397 

uo 

.66917 

.51150 

42636 

.37118 

.33162 

.30139 

.27725 

.25735 

.24065 

.22609 

98 

.66897 

.51172 

.42689 

.37197 

.33262 

.30257 

.27860 

.25885 

.24219 

.22786 

106 

.66879 

.51192 

.42735 

.37263 

.33346 

.30357 

.27974 

.26012 

.24358 

.22936 

114 

.66864 

.51208 

.42774 

.37321 

.33418 

.30443 

.28071 

.26121 

.24477 

.23065 

122 

.66851 

.51223 

.42808 

.37370 

33482 

.30518 

28156 

.26216 

.24581 

.23177 

138 

.66829 

.51247 

.42864 

.37452 

.33585 

.30641 

.28297 

.26372 

.24752 

.23362 

154 

.66812 

.51266 

.42909 

.37517 

.33667 

.30738 

.28408 

.26496 

.24887 

.23508 

170 

.66798 

.51281 

.42944 

.37570 

.33734 

.30817 

28498 

.26596 

.24997 

.23627 

202 

66777 

.51304 

43000 

.37649 

.33836 

.30937 

.28635 

.26749 

.25164 

.23808 

234 

.66762 

.51322 

.43040 

.37708 

.33909 

.31025 | 

.28735 

.26860 

.25285 

.23939 

266 

.66751 

.51335 

.43070 

.37752 

.33965 

.31091 

| 

.28810 

.26944 

.25377 

.24038 

330 

.66734 

.51353 

.43112 

.37814 

.34044 

.31185 

.28917 

.27063 

.25508 

.24179 

394 

.66723 

.51365 

.43141 

.37856 

.34097 

.31248 

.28990 

.27143 

.25596 

j 

.24274 

522 

.66709 

.51381 

.43178 

.37910 

.34164 

.31327 

.29081 

27244 

.25707 

.24394 

778 

.66695 

.51396 

.43215 

.37963 

.34233 

.31408 

.29173 j 

.27347 

.25819 

.24516 

1290 

.66684 

.51409 

.43245 

.38007 

.34288 

.31474 

.29248 

.27430 

.25910 

.24613 

2314 

.66676 

.51418 

.43264 

.38036 

.34325 

.31518 

.29298 

.27486 

.25971 

.24680 

00 

.66667 

.51429 

.43290 

.38073 

.34372 

.31573 

.29361 

.27556 

.26048 

.24763 


198 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


error should be estimated from the root mean square successive differences as 
follows: 


P.E. = .6745 




»—i 

E (X,+i - x,) ! 

-1 _' 

2 (n — 1) 


In 1940, J. von Neumann and R. H. Kent in [2] investigated further the estima¬ 
tion of probable error from mean square successive differences (sums of squares 
of first differences). J. von Neumann, R. II. Kent, H R. Beilinson, and B. I. 
Hart [3] considered the distribution of 

s 2 = —. E (a+1 - X ,) 2 

71 — l,«l 


in a paper which appeared in June 1941. J. D. Williams [4] obtained the 
$ 2 

moments of r) = - , where 


S 2 = ^ E (x, - x) 2 , 
n ,-i 

and indicated that the rth moment of ij is equal to the rth moment of 5 2 divided 
by the rth moment of s 2 . The distribution of the ratio of the mean square 
successive difference to the variance has been published by J. von Neumann 
[5], [6] and B. I. Hart tabulated the probability integral and obtained percentage 
points for this statistic ([7], [8]). Indeed, it should be remarked that the statis¬ 
tical theory of successive differences is allied with the problem of serial correla¬ 
tion [9], Finally, the use of squared differences of higher order than the first for 
estimating variance appears to have been suggested by A. A. Bennett. Quite 
independently, a treatment of the subject was given by Morse [10] in connection 
with problems on exterior ballistics. Various results on successive-difference 
estimation including significance tests have been given by Tintner [13]. One of 
Tintner’s tests involves the use of selected sets of differences. 

3. Definitions and notations. Suppose the observations x\ , x 2 , x 5 , • • • x n 
are made at times a — • • • < t n = b and the t x are uniformly spaced 

without error. Let f(t x ) be the true trend so that tj x = f(U) is the mean of the 
population from which x x is drawn and e x = x x — tj x is a random error. Further, 
let p be a non-negative integer less than n and denote to the ith backward differ¬ 
ence of order p of x by A p x,, i.e. 

A’x, = A p-l x, - A^x,-! = t, (-D r (r) 

/m\ _ ml m 
\n/ nl(rn — n)T 


where 


and i > p + 1. 



ESTIMATION OF DISPERSION 


199 


We define the following: 


(1) 

^*.p — /o 

1 


n 

z 

(A p «) s ; 


C 

r)<»- 

- p) 

i-p+1 


(2) 

dn,p — Tt 

1 


n 

z 

(a”*.) 2 ; 


(; 

*) <“' 

- p) 

t-p+1 


(3) 

2 

P *.P — /f 

1 


n 

z 

(A",,)*; 


( 

:>■ 

- p) 

*— P+1 


(4) 

^n.P “ /n 

2 


n 

z 

(A p jj,) (A p «, 


( 2 

f J (»- 

- p) 

*-p+1 



By E(u ) we will mean the expected value of u , whereas the variance of u will 
be denoted by 

Var (u) = E{u - E(u)}\ 

Basically, we shall assume that the t, are sufficiently Gaussian and inde¬ 
pendent that 

EM = £(«‘) = 0, E(A) = a 2 , 

H* = E(e\) = 3 </ , 

£(«“*?) = 

whenever t, j, a and 0 are positive integers for which 

i t* j, 1 < i < n, 1 < j < n. 

4. Expected values. We will now determine the mean or expected values 
of &n,p and d n , p . 


E(S 2 n, p ) = / 2 \ JL E 

(p) <* -») -** 





or 

(5) £(6V,)=* 2 . 

(see Lemma 1.3 of section 6 below), 



200 


ANTHONY P. MORSE AND PRANK E. GRUBBS 


Continuing, we have 


*(0 

E(d\, P ) 


e{ i, (A”«i + A’*) 5 }, 

1<-P+1 J 


( 2 p) (n ~ P) 


or 


(6) E(d\, p ) = <r 2 + p 2 b , p . 

Consequently, we observe, d 2 n , p is on the average larger than <r 2 by the quantity 
p BtP . In a particular problem, therefore, we are faced with the situation of 
choosing that combination of n and p which (i) regulates the size of p B iP and (it) 
gives the desired precision of our estimate of variance. 

6. The magnitude of p B iP . In order to study the size of p B , p , we will derive 
for this quantity an upper bound which will indicate the applicability of the 
method of differences to non-polynomial as well as polynomial trends. 

Now, 

A P 1 U = A "/(<() = ( f f f (P \yi - yt - ••• -Vp) dy v dy^i ••• dyi, 
Jo Jo 

where t r — t r -i = h , by straightforward integration. It will be convenient to 
change the order of integration; thus 

A”/(<i) = [ ■■■ f [ f (p \yi - yt - - y P ) dy, dy p • ■ • dy t . 

Jo Jo •'«<-! 


Since, from Schwarz’s inequality it is clear that 



(0 ~ «) f I 

"a 


2 


ds 


whenever a and 0 are real numbers and g is integrable, we have 

{A',.)’ <h” f ■■■ f f' -yi - y,)\ 2 dy x dy r ■ ■ ■ dy t . 

Jo Jo Jt % -1 


Also, 


2 {A p i? 4 ) 2 < h p [ --• f [ {/ (p) (2/i ~ 2/2 - • • ■ - 2/p)) 2 dy i dy p • • • dy %. 

i-p+1 JO JO Jt p 

But for 0 < r < (p —* l)/i = J p — a we have 

r ~ r)\ t dy 1 = lf ,p) (s)\'da < f {/ <p) (a)} a 

*' f p *( p—f vg 






ESTIMATION OP DISPERSION 


201. 


Consequently 

E (A'*)’ <h” f ■■■ f f" {f cp) (a)}*da dy p - - dyt = h ,p ~ l f {/'’’(s )} 2 da. 

*-»p+1 •'O •'O •'a •/« 

Since h = --?, we have finally 

w — 1 


(7) 



i/ <P> (8)) S * 

6-o ’ 


which is an upper bound for v\, p in terms of the average value of the square of 
the pth derivative of the trend function/. 

If the trend function / is of the polynomial form, 


!(() =£,* 1 ' 

r-0 


then the effect of the trend can be eliminated from our observations by estimating 
dispersion from (p + l)st differences. However, if it is known that the trend is 
of polynomial form, then an estimate of dispersion based on least squares would, 
of course, be better. In fact, it will be shown later that the precision of 5«. p 
decreases markedly as p increases. The use of d\ tV as an estimate of <r 2 is pri¬ 
marily of value when the type of trend is unknown; however, even when the type 
of trend is known the computational simplicity of d\ , p may offset to some extent 
its lack of optimum precision. 

Let us reflect on the magnitude of v\, p over a single period of a sinusoidal trend, 
say f(t) = sin t. In (7) we set a = 0, b = 2tt and secure 


2 

Vn, p 


< 


* ( 2tt \ 2p ~ l 

(^)<» - - l) 


Taking n to be the number of observations for a complete period, a tabulation of 
the upper bound for v\ %p for this case is given in Table II. Thus, when there 
are about seven or more observations in each interval of length one period, esti¬ 
mation of dispersion from higher order differences may prove of considerable 
value even for this rather extreme type of trend. 

6. Some combinatorial relations. Although we will ultimately establish 
expressions for the variances of 6 2 „, P and d\, p , it appears desirable to give first a 
number of combinatorial relations which present themselves in the computation 
of moments. The relations are easily checked and most of them are possibly 
well known. Nevertheless, it will be convenient to record them for reference 
and in some instances to give proofs. In what follows it will be understood that 



= 0 whenever p and q are not such integers that 0 < q < p. 



202 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


TABLE II 


\ 

\ 

5 

6 

7 

8 

0 

10 

1 

.617 

.305 

.274 

.201 

.154 

.110 

2 

.676 

.260 

.120 

.063 

.036 

.016 

3 

.751 

.164 

.049 

.018 

.008 

.002 

4 

.106 

.111 

.021 

.005 

.002 

.0003 

5 

— 

.098 

.009 

.002 

.0004 

.0000 


Lemma 1.1. 

Lemma l- 2 . (?) = P f ). 

Lemma 1.3. £ (?) ( r J .) = ( p % s )- 


Proof: 


Hence 


and 


?(?)*' = = Id + Z )’’) 2 = 

=??©(.->• 

(?)-?(:) GO- 


£(>■ 


CO-?C)C + :-,)-?C)CO-?U 

Lemma 1.4. // p 2 + r 2 > 0 then = ( ? r ^ (r - l)' 
Lemma 1.5. (p - 2 r) 0*) = pl( P ~ ^ - ( v ~ j\l. 

i—i* o. - -)(?)* - hc 7 7 - (r r o*}- 

Proof: Multiply, using. 1.4 and 1.5. 

L “" i 17 -’ "(p + r)’ - »{(p-, 1 )’ - C - 7- J)' 


'Major A. A. Bennett communicated this Lemma. 



ESTIMATION OP DISPERSION 


203 


Proof: (« - = *{(* < *) - (j _ j)}from 1.6. 

Put 8 — 2p, t = p — r, then 

+ r) ’ " - ')’ ~ (p- 7- l)}- 

Lemma 1.8. If f is a function , i , n, p are integers and p + 1 < i < n, 

sC •»-§(?)*’■ 

Proof: 

(?)*»• 

Lemma 1.9. If — x < Air, s) = /l(s, r) < a> /or eacft integer r and s, then 
E ^E 2 Air, *,.}) = (mi — 3o- 4 ) 2 Mr, rf 

+ ° {E Air, r)V + 2<r 4 Z E A(r, ,)*. 

U-l J P-1 «-l 

Proof: Let 2\T(r, s) = 1 when r < s and let N(r y s) = 0 otherwise. Clearly 
Z E A(r, s)t r «, = E A{t, r)t r + 2E E AT(r, *)A(r, s)e r e,, 

r—1 1*1 r—«1 r—il *—1 


and 


e({E E A{r, .)*«.}*) = fi([E A(r, rtf}*) 

+ 4 E ({t E N(r, S )A(r, «)«,*.}*) • 


Now 


and 


E ({E A(r, r)ffy - U - <r 4 ) E A(r, r) J + <r 4 |g A(r, r)J, 
4B ({E E # (r, s)A(r, «)*, e.j^ 


4o 4 E E AT(r, s)A(r, *)’ 

r-l «-l 

2<r 4 E E A(r, «)* - 2<r 4 E A(r, r)* 

r—1 •—1 r—1 



204 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


The last three relations combine to yield the desired result. 


Lemma 1.10. 


(n - p) 2 E{b\, p ) 


+2 ”'§§Lt(.-X-.)r 


Proof: Helped by 1.8, check that 




Therefore 


(?) <»- 


and apply 1.9 to complete the proof. 
Lemma 1.11. 


§§U,G-.) 

C+0 - 2p (*?')' + ^ (?» *) 


Proof. 


- .SSC.-.)C-.)0--)C-r) 

- s I (O' (.+5 - <) 0 - 0 G -,) ■ """3 u. 

- _£> £ § (0 (.+j--)(0C+i-i) j "" gi ' 8 '* sain ' 



ESTIMATION OF DISPERSION 


205 


-Jut?© (,+'-<)? ®G+J-<) 

- ?, ?, {?(?)C+5-<)}’ - ,t,t0 +?-*- ** 

-X ( ” _,,_|r|) (p+--)’ 

CM- 2 §'(,?-)’ 

7-1)} - "** L71 

* <" - ”>(p + rj~ p ')' ~ (* - » - .)} 

- C+•)’ - * (*; *)’ + ^ : ‘) ! • 


Lemma 1.12. 


Proof. 


§-?,(. ir)’-“-'»(?)■ 


se(.!)'= £ £( p Y,fromi.8 ; 

r—1 t—p+1 v v »— p+1 r—1 v »— p+1 r—0 y / 


7. The variances of 5^ and d\ , p . In order to get some idea as to the efficiency 
of the statistics d 2 n , p and d\ , p , we will examine their variances. We have 

( 2 p)* (n “ P) JVar («*.p) = ( 2 p P ) 2 (n - p) ! {2(0 - [E(S \,,)]*) 

= (j) (« - P)V + 2(n - pV‘ (p 2p r ) - 4pa* ~ *)* 

+ V ( 2 ? n 1 J 



206 


ANTHONY P. MORSE AND PRANK E. GRUBBS 


with the aid of Lemmas 1.10, 1.11, 1.12 and using the relation jli 4 — 3<r 4 = 0. 
Thus, 


( 8 ) 


(p*) ( n- p>* Var 


- *<“ - X. C? ')' -(*;')’ + ^ ( 2 v T 

If 2p < n, then 

/ 2p V y/ 2p Y y /2p\ = Ap\ 

r^l. \p + r) V\P + r/ r\r/ \2p/ ' 

Moreover, ^ ^ = 0. 

Therefore, 

(9) ^ 2 p P ^ (» - p) 2 Var (b\. p ) = 2(n - P)(^)° ~ 4 p(^ P ~ ^ ° 

when 2 p < n. 

As for the variance of d\, p , we have 

Var (d 2 „. P ) = E\d\,„ - v\.„ - <r 2 | 2 = E{6l P + k n „ + «. 2 „, p - r 2 „, p - „')* 
= E{i\, , - a 2 ) + fc„. p } 2 , 


or 

(10) Var (d 2 „.„) = Var («„„) 2 + E{k\. r ), 

since E[{b\, P — <r 2 )fc„, p ] = 0. 

However, from Schwarz’s inequality, it is guaranteed that 

E{k\, p ) < 


Thus 

(11) Var d\, p < Var (i 2 „, P ) + 4v 2 ,, p <r ! . 


An upper bound has already been given for v\,„ in section 5 above. 

8. The efficiency of <5 2 , p . It is appropriate to consider the efficiency (as 
defined by Fisher [11]) of the statistic <5», p . In this sense, the efficiency of 
S*.,p is given by 


JF(n, p) 


Var a 2 , 
Var 5*. , P 


where s 2 . 


H (x< - «)* 


n - 1 



ESTIMATION OP DISPERSION 


207 


Accordingly, 


W(n, p ) 


2a 4 

in - 1) Var (h\, P ) 


or 


W{n , p) 


, (2 pV 

(12) _ ( n_p )\p/ 

(„ - 1) {(» - ») (j, J J - 2p (* ” ■)' + 2l> l )} 

If 2p < n 



Formulas (12) and (13) were used in preparing Table I given at the end of the 
paper. For convenience in using formulas (1) and (2) the binomial coefficients 



for 0 < p < 10 are given in Table III. 


If n > 2, then 


(16) 


W{n, 1) 


2 1 = 2 (n - 1) 

3* 1 3n — 4 ' 

3n - 3 


as was pointed out by von Neumann, Kent, Beilinson, and Hart in [3]. 
If » > 4, then 


(16) W( ' n ' 2) = 36 


{ 1+ *-2}{^“38(»-a)} 


18(n - 2)» 

(n - 1)(35» - 88) * 



208 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


As a limiting value for n, we have 


(17) 


W( oo, p) as Lim W(n, p) = 

«-*oc 



Using Stirling’s formula for the approximation to the factorial, we have 
Lim VptV(», p ) = i/-. 

p—► oo f 7T 

Thus,asp—>oo,TT(oo,p) tends to zero and is asymptotically equal to 



TABLE III 
The Binomial Coefficient 


p 

(?) 

0 

1 

1 

2 

2 

6 

3 

20 

4 

70 

5 

252 

6 

924 

7 

3432 

8 

12870 

9 

48020 

10 

184756 


2 2 ^ ^ ^ „ 

For the case n > 2, p > 1 and / constant , then s n = ——-- and 5 n « 

n — 1 

and d 2 n , p are all unbiased estimates of the population variance a 2 . Moreover, 
for this case 


= V ar ( g ») _ Var (s 2 n 
Var (5l,p) Var (d 2 n , p ] 


Using s 2 m based on m — 1 degrees of freedom and keeping the trend, /, con¬ 
stant, then m and n may be chosen so that approximately 


Var (si) = Var (d\, p ) 


and for a normal population this means that 


m = 1 + (n — 1 )W(n, p). 





ESTIMATION OF DISPERSION 


209 


Using Table I, it may be seen that for constant trend, /, the worth of c&.io as 
an estimate of a for a normal population is about the same as that of $n , whereas 
that of d\o, i is about equivalent to . However, if the trend / is not constant 
then the worth of s 2 n as an estimate of a 2 is diminished while that of d\, p is 
increased. 

Similarly, if the trend is cubic over 20 observations then least squares gives 
an unbiased estimate of <? based on 16 degrees of freedom, whereas c? 2 o ,4 gives an 
estimate equivalent in precision to about 6.4 degrees of freedom. However, if 
only eight observations follow a cubic trend, then least squares furnish an un¬ 
biased estimate of a 2 based on four degrees of freedom whereas d\, 4 furnishes an 
estimate equivalent to about 1.9 degrees of freedom. Thus, in the case of 20 
observations, cubic least squares is, so to speak, 2.5 times as valuable as <^ 0 , 4 ; 
in the case of eight observations, cubic least squares is 2.1 times as valuable 
as d 8 , 4 . 

It might be mentioned that the method of differences is of value in estimating 
goodness of fit. If the fit is good, then our estimate of <r 2 derived from least 
squares should on the average be equal to the estimate derived from a suitable 
d\ tP . If the fit is poor then d\, p will be smaller on the average than the former. 

9. The approximate probable error in estimating <r from differences. The 
approximate standard error of S n , p is given by the relation 


S.E. (5 n>p ) 


1 S.E. ( 6 2 w ,„) _ _ * 

2 a \/ 2 (n — l)W{n } p) * 


If p has been so chosen that v\, p is suitably small then [see equation (11)] 
some confidence may be put in the approximate formulas: 


(18) 

(19) 


S.E (d„, P ) y/2{n - \)W{n,p) 

_- 6745a 

P.E. K.p) - 1 DH^Tp) • 


Formula (19) was used in preparing Table IV which gives the approximate 
probable error to be feared in using d n , p as an estimate of <r. This table should 
yield interesting information whenever p has been chosen so that d\ , p is a suitably 
unbiased estimate of a 2 . 


10. Remarks. We have presented a useful technique for estimating variance 
from higher order differences and have given the precision of our estimate. The 
method of estimating variance from higher order differences appears to be quite 
valuable in cases where the type of trend in our observations is unknown. A 
considerable field of work remains concerning a complete investigation of the 
distribution and other properties of the statistic d\, p . In this connection, 

n 

Baer [ 12 ] has already published a study on the stochastic limit of--d a Wil . 

It is hoped that others will contribute to the problem of estimating dispersion 



210 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


TABLE IV 


The Probable Error In Estimating <r From Differences* 


X 

0 

l 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 1 

.4769 











2 

.3373 

.4769 










3 

.2753 

.3771 

.4769 









4 

.2384 

.3180 

.4054 

.4769 








5 

.2133 

.2796 

.3495 

.4215 

.4769 







6 1 

.1948 

.2524 

.3104 

.3704 

.4404 

.4769 






7 1 

.1803 

.2317 

.2817 

.3318 

.3855 

.4390 

.4769 





8 

.1686 

.2154 

.2596 

.3024 

.3477 

.3969 

.4442 

.4769 




9 1 

.1589 

.2022 

.2420 

.2794 

.3183 

.3604 

.4057 

. 44 S 1 

.4769 



10 

.1508 

.1911 

.2274 

.2610 

.2948 

.3311 

.3708 

.4128 

.4513 

.4769 


11 

.1438 

.1816 

.2153 

.2457 

.2758 

.3074 

.3417 

.3794 

.4186 

.4537 

.4769 

12 

.1376 

.1734 

.2048 

.2328 

.2599 

.2880 

.3180 

.3508 

.3867 

.4234 

.4558 

13 

.1323 

.1663 

.1958 

.2217 

.2465 

.2717 

.2983 

.3272 

.3587 

.3930 

.4276 

14 i 

.1274 

.1599 

.1878 

.2120 

.2350 

.2579 

.2818 

.3073 

.3351 

.3656 

.3984 

15 

.1231 

.1542 

.1808 

.2035 

.2248 

.2459 

.2677 

.2905 

.3152 

.3423 

.3718 

ie 

.1192 

.1491 

.1744 

.1960 

.2159 

.2355 

.2554 

.2761 

.2983 

.3223 

.3485 

17 

.1156 

.1445 

.1687 

.1892 

.2080 

.2262 

.2447 

.2637 

.2837 

' .3052 

.3286 

18 1 

.1124 

.1403 

.1636 

.1831 

.2009 

.2180 

.2352 

.2527 

.2710 

| .2905 

.3116 

19 

.1094 

.1364 

.1589 

.1775 

.1945 

.2106 

.2267 

.2430 

.2599 

.2777 

.2967 

20 1 

.1066 

.1328 

.1545 

.1724 

.1886 

.2040 

.2191 

.2343 

.2500 

.2663 

.2837 

21 

.1040 

.1295 

.1505 

.1677 

.1832 

.1978 

.2121 

.2264 

.2411 

.2562 

.2722 

22 

.1016 

.1265 

.1468 

.1634 

.1783 

. 1922 

.2058 

.2193 

.2331 

.2472 

.2620 

23 

.0994 

.1236 

.1433 

.1591 

.1738 

.1871 

.2000 

.2129 

.2258 

.2391 

.2529 

24 

.0973 

.1209 

.1401 

.1557 

.1695 

.1824 

.1948 

.2069 

.2191 

.2316 

.2446 

25 

.0954 

.1184 

.1371 

.1522 

.1656 

.1779 

.1898 

.2015 

.2131 

.2249 

.2370 

26 

.0935 

.1160 

.1343 

.1490 

.1619 

.1739 

.1853 

.1964 

.2075 

.2187 

.2301 

27 

.0918 

.1138 

.1316 

.1459 

.1585 

.1700 

.1810 

.1917 

.2023 

.2130 

.2238 

28 

.0902 

.1117 

.1291 

.1431 

.1553 

.1664 

.1770 

.1873 

.1975 

.2077 

.2180 

29 

.0885 

.1097 

. 1268 

.1404 

. 1522 

.1631 

.1733 

1832 

.1930 

.2028 

.2126 

30 

.0871 

.1078 

. 1245 

.1378 

.1493 

.1599 

.1698 

.1794 

.1888 

.1981 

.2076 

31 

.0857 

.1060 

.1224 

.1354 

.1466 

.1569 

.1665 

.1758 

.1848 

.1938 

.2029 

32 

.0843 

.1043 

.1204 

.1331 

.1441 

.1540 

.1634 

.1724 

.1811 

.1898 

.1985 

33 

.0831 

.1027 

.1184 

.1309 

.1416 

.1514 

.1605 

.1692 

.1776 

.1860 

.1944 

34 

.0818 

.1012 

.1166 

.1288 

.1393 

.1488 

.1577 

.1661 

.1744 

.1825 

.1905 

35 

.0807 

.0999 

.1149 

.1268 

.1371 

.1464 

.1550 

.1632 

.1713 

.1791 

.1869 

36 

.0795 

.0983 

.1132 

.1249 

.1350 

.1441 

.1525 

.1605 

.1683 

.1759 

.1834 

37 

.0784 

.0969 

.1116 

.1231 

.1330 

.1418 

.1501 

.1579 

.1655 

.1729 

.1802 

38 

.0774 

.0956 

.1101 

.1214 

.1311 

.1397 

.1478 

.1555 

.1628 

.1700 

.1771 

39 

.0764 

.0943 

.1086 

.1197 

.1292 

.1377 

.1456 

.1531 

.1603 

.1673 

.1741 

40 

.0754 

.0931 

.1072 

.1181 

.1274 

.1358 

.1435 

.1508 

.1578 

.1646 

.1713 



ESTIMATION OP DISPERSION 


211 


TABLE IV —Continued 


\p 
n \ 

\ 

0 

l 

2 

3 

4 

5 

6 

7 

8 

9 

10 

42 

.0736 

.0909 

.1045 

.1151 

.1241 

.1322 

.1396 

.1466 

.1533 

.1597 

.1661 

44 

.0719 

.0887 

.1020 

.1123 

.1211 

.1288 

.1360 

.1427 

.1491 

.1553 

.1613 

46 

.0703 

.0868 

.0997 

.1097 

.1182 

.1257 

.1326 

.1391 

.1453 

.1512 

.1570 

48 

.0689 

.0849 

.0975 

.1073 

.1155 

.1228 

.1295 

.1357 

.1417 

.1474 

.1529 

60 

.0675 

.0832 

.0955 

.1050 

.1130 

.1201 

.1266 

.1326 

.1383 

.1438 

.1492 

52 

.0661 

.0815 

.0936 

.1029 

.1107 

.1176 

.1238 

.1297 

.1352 

.1405 

.1457 

64 

.0649 

.0800 

.0918 

.1009 

.1085 

.1152 

.1213 

.1270 

.1323 

.1375 

.1425 

66 

.0637 

.0785 

.0901 

.0990 

.1064 

.1129 

.1189 

.1244 

.1296 

.1346 

.1394 

58 

.0626 

.0771 

.0885 

.0972 

.1045 

.1108 

.1166 

.1220 

.1271 

.1319 

.1366 

62 

.0606 

.0746 

.0855 

.0939 

.1008 

.1069 

.1125 

.1176 

.1224 

.1270 

.1313 

66 

.0587 

.0723 

.0828 

.0909 

.0975 

.1034 

.1087 

.1136 

.1182 

.1225 

.1266 

70 

.0570 

.0702 

.0804 

.0881 

.0946 

.1002 

.1053 

.1100 

1 .1144 

.1185 

.1224 

74 

.0554 

.0682 

.0781 

.0856 

.0919 

.0973 

.1022 

.1067 

.1 X 09 

.1149 

.1186 

78 

.0540 

.0664 

.0760 

.0833 

.0894 

.0947 

.0994 

.1037 

.1077 

.1115 

.1152 

82 

.0527 . 

.0648 

.0741 

.0812 

.0871 

.0922 

.0968 

.1009 

.1048 

.1085 

.1120 

90 

.0503 

.0618 

.0707 

.0774 

.0830 

.0878 

.0921 

.0960 

.0997 

.1031 

.1063 

98 j 

.0482 

.0592 

.0677 

.0741 

.0794 

.0840 

.0880 

.0917 

.0952 

.0984 

.1014 

106 

.0463 

.0569 

.0650 

.0712 

.0762 

.0806 

.0845 

.0880 

.0913 

.0943 

.0972 

114 

.0447 

.0549 

.0627 

.0686 

.0734 

.0776 

.0813 

.0847 

.0878 

.0907 

.0934 

122 

.0432 

.0530 

.0606 

.0663 

.0709 

.0749 

.0785 

.0817 

.0847 

.0875 

.0900 

138 

.0406 

.0498 

.0569 

.0622 

.0666 

.0703 

.0736 

.0766 

.0794 

.0819 

.0843 

154 

.0384 

.0472 

.0538 

.0589 

.0630 

.0664 

.0695 

.0723 

.0749 

.0773 

.0795 

170 

.0366 

.0449 

.0512 

.0560 

.0599 

.0632 

.0661 

.0687 

.0711 

.0734' 

.0755 

202 

.0336 

.0412 

.0470 

.0513 

.0548 

.0578 

.0605 

.0629 

.0650 

.0671 

.0689 

234 

.0312 

.0382 

.0436 

.0476 

.0509 

.0537 

.0561 

.0583 

.0603 

.0621 

.0639 

266 

.0292 

.0359 

.0409 

.0446 

.0477 

.0503 

.0525 

.0546 

.0565 

.0582 

.0598 

330 

.0262 

.0322 

.0367 

.0400 

.0428 

.0451 

.0471 

.0489 

.0505 

.0521 

.0535 

394 

.0240 

.0295 

.0336 

.0366 

.0391 

.0412 

.0430 

.0447 

.0462 

.0475 

.0488 

522 

.0209 

.0256 

.0292 

.0318 

.0339 

.0357 

.0373 

.0387 

.0400 

.0412 

.0423 

778 

.0171 

.0210 

.0239 

.0260 

.0278 

.0292 

.0305 

.0317 

.0327 

.0337 

.0346 

1290 

.0133 

.0163 

.0185 

.0202 

.0216 

.0227 

.0237 

.0246 

.0254 

.0261 

.0268 

2314 

.0099 

.0121 

.0138 

.0151 

.0161 

.0169 

.0177 

.0183 

.0189 

.0195 

.0200 


* If dn* p is a sufficiently unbiased estimate of <r*, then the approximate probable error 
to be feared in using d n . p as an estimate of <r may be obtained by multiplying the following 
tabular entries by a. 



212 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


when observed data display trends as it is believed that the method of differences 
deserves much attention. In particular, it is hoped that someone will have the 
time and ingenuity to calculate the distribution of the statistic 





Were this done, an admirable criterion would be at hand for gauging the signifi¬ 
cance of a change in the estimate of a as we pass from differences of order p to 
those of order p + 1. Of course, useful information in this connection could be 
obtained from a knowledge of the distributions of & 2 ntP and 5 », p -h ; in fact their 
variances as herein calculated give us a basis for somewhat reasonable conclu¬ 
sions. An expression for the standard error of the difference between the 
estimates of a from two consecutive scries of finite differences is given in 
[13, Chapter VI]. 

In connection with testing goodness of fit, it would be valuable also to know 
the distribution of 

S\, p 

*2 ’ 

where S 2 n , p is the estimate of variance derived from the least squares fitting of a 
polynomial of degree p. 

For convenience of reference, we conclude the paper with 

11. A concise description of the method and its precision. It frequently 
happens that successive observations made at regular intervals are subject to 
the same standard error <r while the means of the populations from which they 
are drawn display a trend. We give here a method of estimating the variance a 2 
and of determining the precision of our estimate. This method is primarily of 
value when the trend is unknown; however even when the type of trend is known, 
its computational simplicity may make the method advantageous. 


The method. Arrange the data in a vertical column and then in the usual 
way form difference columns of order 1,2, - - * , p. Sum the squares of the pth 

order differences and divide by the number (n — p) 

is the number d\, p , where 


2 p\ 
V) 


Our estimate of a 


d\ p 


JL_ 

( 2 p) (n _ v) 


E (A-A) 2 . 

1-p+l 



(x* — 2x t+ i + x t4 *)* 


n 


Xi 


4 Dixon [9J gives moments of the statistic 


23 (xi - xi +l y 


where x n +i 



ESTIMATION OP DISPERSION 


213 


The precision. The precision of this estimate may be determined from the 
following information (which has been derived in the present paper): 


E{d\, p ) = cr 2 + v\ 


2 


< ( b ~ *\( b ~ a \ 2P ~ l f* f/ (P) (*)] 2 <** . 

“ /2p\ \n — p)\n — 1/ J a b — a 9 

\P) 

Var (d 2 „, P ) < Var(fi 2 nt p) + 4 v\, p a ; 


Var (5 2 n , p ) 

where W(n , p) is given in Table I. 


2a 4 


(n - 1 )W(n, pY 


TABLE V 


p 


a y 

*z 

1 

18.90 

184.62 

11.22 

2 

1.21 

1.88 

10.56 

3 

.88 

1.85 

10.30 

4 

.87 

1.84 

10.12 

5 

.86 

1.83 

10.01 


In case v\ tV is sufficiently small (this is determined by the requirements of the 
given problem), then Table IV may be used directly to determine the approxi¬ 
mate probable error in using d n , p as an estimate of a. 

An example. As a practical example of the use of the method of differences 
when the trend is unknown and of the stability of the statistic d\, p with respect 
to p, we mention a recent problem at Aberdeen Proving Ground which had to do 
with estimating the accuracy with which certain photographic measurements 
locate a moving object. Ballistic Cameras were used to determine horizontal 
x and y , and vertical z coordinates (all in feet) of an airplane traveling about 
160 mph at an elevation of about 35,000 feet. An automatic pilot was in use in 
the airplane as it flew over a three mile course. At one second intervals for a 
period of 70 seconds two Ballistic Cameras, 5000 feet apart, were used to locate 
the plane. Since the plane was traveling pretty much in the y direction one 
would expect: that first differences would yield a standard error in y far in excess 
of its true one; that second differences would furnish a much better estimate; 
and that perhaps third differences would yield a still more trustworthy one. No 
matter what order of difference is used we never expect such an estimate to be 
too small. In this problem, the standard errors in x f y,z as estimated from dif¬ 
ferences of certain orders, p, were as given in Table V. 



214 


ANTHONY P. MORSE AND FRANK E. GRUBBS 


REFERENCES 

[1] A. A. Bennett, unpublished report to the Chief of Ordnance, U. S. Army, circa 1918. 
12] R. H. Kent and J. von Neumann, “The estimation of the probable error from suc¬ 
cessive differences”, Ballistic Res. Lab. Report No. 175, Feb., 1940. 

[3] John von Neumann, R. H. Kent, H. R. Bellinson, and B. 1. Hart, “The mean 

square successive difference”, Annals of Math. Stat., Vol. 12 (1941), pp. 153-162. 

[4] J. D. Williams, “Moments of the ratio of the mean square successive difference to 

the mean square difference in samples from a normal universe”, Annals of Math. 
Slat., Vol. 12 (1941), pp. 239-241. 

[5] John von Neumann, “Distribution of the ratio of the mean square successive differ¬ 

ence to the variance”, Annals of Math. Stat., Vol. 12 (1941), pp. 367-395. 

[6] John von Neumann, “A further remark concerning the distribution of the ratio of 

the mean square successive difference to the variance”, Annals of Math. Stat., 
Vol. 13 (1942), pp. 86-88. 

[7] B. I. Hart, “Tabulation of the probabilities for the ratio of the mean square suc¬ 

cessive difference to the variance”, Annals of Math. Stat., Vol. 13 (1942), pp. 
207-214. 

[8] B. I. Hart, “Significance levels for the ratio of the mean square successive difference 

to the variance”, Annals of Math. Stat., Vol. 13 (1942), pp. 445-446. 

[9] Wilfrid J. Dixon, “Further contributions * to the problem of serial correlation”, 

Annals of Math. Stat., Vol. 15 (1944), pp. 119-144. 

[10] Anthony P. Morse, “The estimation of dispersion from differences”, Ballistic Res 

Lab. Report No. 557, July, 1945. 

[11] R. A. Fisher, Phil. Trans. A., Vol. 222 (1922), p. 316. 

[12] Reinhold Baer, “Sampling from a changing population”, Annals of Math. Stat , 

Vol. 16 (1945), pp. 348-361. 

[13] G. Tintner, The variate Difference Method, (Cowles Commission Monograph No 5), 

Principia Press, Bloomington, Indiana, 1940. 



THE EFFICIENCY OF SEQUENTIAL ESTIMATES AND WALD'S 
EQUATION FOR SEQUENTIAL PROCESSES 

By J. Wolyowitz 
Columbia University 

1. Summary. Let n successive independent observations be made on the 
same chance variable whose distribution function /(x, 0) depends on a single 
parameter 0. The number n is a chance variable which depends upon the out¬ 
comes of successive observations; it is precisely defined in the text below. Let 
0*(xi , • • * , x„) be an estimate of 0 whose bias is 6(0). Subject to certain regu¬ 
larity conditions stated below, it is proved that 

*’<»*> > (i+*)'[«•* (*-£9T- 

When /(x, 0) is the binomial distribution and 0* is unbiased the lower bound 
given here specializes to one first announced by Girshick [3], obtained under no 
doubt different conditions of regularity. When the chance variable n is a con¬ 
stant the lower bound given above is the same as that obtained in [2], page 480, 
under different conditions of regularity. 1 

Let the parameter 0 consist of l components 0i, • • • , 0* for which there are 
given the respective unbiased estimates 0?(xi, • • • , x n ), • • • , 0?(xi,•• • , x n ). 
Let || \ij || be the non-singular covariance matrix of the latter, and || X iy || its 
inverse. The concentration ellipsoid in the space of (ki , • • • , ki) is defined as 

Ex^Jfc, - *)(*/ - ei) =1+2. 

(This valuable concept is due to Cramdr). If a unit mass be uniformly dis¬ 
tributed over the concentration ellipsoid, the matrix of its products of inertia 
will coincide with the covariance matrix || A,/ ||. In [4] Cramer proves that no 
matter what the unbiased estimates 0? , • • • , 0* , (provided that certain regu¬ 
larity conditions are fulfilled), when n is constant their concentration ellipsoid 
always contains within itself the ellipsoid 

Zw/ft ~ 0»)(fc/ — Oj) = l + 2 

•'.I 

where 

_ nF (d log fd log A 

__ wr 

1 To whom this result is to be ascribed is not clear from the context in which Professor 
Cram6r describes it (in [2]). After the present paper was completed the author learnedx>f 
the papers by Rao [8] and Aitken and Silverstone [9], both of which deal with this question. 
The author is indebted to Prof. M. S. Bartlett for drawing his attention to these papers. 

215 



216 


J. WOLFOWITZ 


Consider now the sequential procedure of this paper. Let 0* , • • • , 0* be, 
as before, unbiased estimates of 0i , • • • , 0i , respectively, recalling, however, 
that the number n of observations is a chance variable. It is proved that the 
concentration ellipsoid of 6 * , • • • ,0? always contains within itself the ellipsoid 

£ - 0 .) (k-e,)= l + 2 

t ,i 

where 


m.j 


EnE 


< d log fd log A 
V ee, de,- /' 


When n is a constant this becomes Cramer’s result (under different conditions 
of regularity). 

In section 7 is presented a number of results related to the equation 
EZ n = EnEX , which is due to Wald [6] and is fundamental for sequential 
analysis. 


2. Introduction. I vet X be a chance variable whose distribution function 
f(x , 0) depends on the parameter 0. It is assumed that X either has a probability 
density function (which we then denote by /Or, 0)) or that it can take only 
an at most denumerable number of discrete values (in the latter case 
f(x, 0) = P{X = x], where the latter symbol denotes the probability of the 
relation in braces). Let w = X\ , x 2 , • • • be an infinite sequence of observations 
on X, and let 12 be the space of “points” w. Let there be given an infinite 
sequence of Borel measurable functions <pi(xi), ^(xi , x 2 ), • • • y <p 3 (x i , * • • , x,), • • • 
defined for all oj in 12, such that each takes only the values zero and one. It is 
well known that the function /(x, 0) defines a measure (probability) on a Borel 
field in 12. We assume that everywhere in 12, except possibly on a set whose prob¬ 
ability is zero for all 0 under consideration, at least one of the functions <pi , , 

takes the value one. Let n{ a>) be the smallest integer at which this occurs. 
Thus n(co) is a chance variable. 

In statistical applications the chance variable n(w) may be interpreted as a 
rule for terminating a sequence of observations on the chance variable X, the 
probability of termination being one, and the decision to terminate depending 
only upon the observations obtained. A sequential test is an example of this 
procedure. The converse is, however, not true, because the process described 
above does not require that any statistical decision should be reached when the 
process of drawing observations is terminated. 

An “estimate” of 0 is a function 0*(xi, • • • , x n ) of the observations Xi , • • • , x n 
(those obtained prior to the “termination” of the process of drawing observa¬ 
tions) . In the sequel we shall limit ourselves to estimates whose second moments 
are finite. The estimate is “unbiased” if EQ* y the expected value of 0*, is 0. 
When this is not so Ed* — 0 is called the bias, b(0), of 6*. In general the bias 
is a function of 0. It is obvious that the function 6* may be undefined on a set 
of points (*!,••• , x») whose probability is zero for all 0 under consideration. 



EFFICIENCY OF SEQUENTIAL ESTIMATES 


217 


In the present paper we shall be concerned with an upper bound on the effi¬ 
ciency of a sequential estimate, or, more precisely, with a lower bound on its 
variance. This lower bound is intimately related to certain results on the effi¬ 
ciency of the maximum likelihood estimate from a sample of fixed size. This is 
not surprising since fixed-size sampling is a special instance of sequential sam¬ 
pling. The results obtained in this paper are also obviously and intimately 
related to those due to Cramer [4] and those described by him in [2], pp. 477-488. 
Naturally the conditions of regularity (restrictions on f(x , 0), 0*, etc.) under 
which the results are proved are different. For example, no restrictions on the 
sequential sampling procedure need appear in the statement of a theorem which 
deals only with samples of fixed size. 

The argument below proceeds as if /(*r, 0) were a probability density function. 
The results apply equally well to the case where/(x, 0) is the probability function 
of a discrete chance variable provided: 

1) . Integration is replaced by summation wherever this is obviously required. 

2) . The phrase “almost all points” in a Euclidean space of any finite dimen¬ 
sionality is understood 

a) . as all points in the space with the possible exception of a set of Lebesgue 
measure zero, when/(x, 0) is a probability density function 

b) . as all points in the space with the possible exception of points one of whose 
coordinates is a member of the set Z, when /(x, 0) is the probability function of a 
discrete chance variable. The set Z consists of all points z such that/(z, 0) = 0 
identically for all 0 under consideration. 

3. Conditions of regularity. In this section we shall formulate the restrictions 
which we impose on /, the estimates, and the sequential process. They are 
intended to be such as will be satisfied in most cases of statistical interest. No 
doubt they can be weakened, but the author has decided against attempting to 
do so here. The list may seem long for two reasons. Seldom in the literature 
are the assumptions which, for example, lead to validation of differentiation 
under the integral sign etc., formulated explicitly. The presence of a sequential 
procedure means that additional restrictions must be imposed. 

In this section we assume that 0 is a single parameter. The case where 0 has 
more than one component is treated later. 

(3.1) . The parameter 8 lies in an open interval D of the real line. D may consist 
of the entire line or of an entire half-line. 

(3.2) . The derivative ~ exists for all 6 in D and almost all x. We define 

00 

- as zero whenever /(x, 0) = 0; thus is defined for all 0 in D and 

68 00 

almost all x. We postulate that E = o and that E ( - ^ 

00 \ 68 / 

he not zero for all 8 in D. 



218 


J. WOLFOWTTZ 


*»• K§l £Js ^l)' 

exists for all 6 in D. 

(3.4). Let Rj, (j = 1, 2, • • • ), be the set of points (xi, • • • , Xj) in thej-dimen- 
sional Euclidean space such that 

, • • • , xd = 0 i = 1, 2, • • • ,/ - 1 

<Pi(x i > * • • , xj) = 1- 

For any integral j there exists a non-negative L-measurable function Tj(x i, • • • , Xj) 
such that 


a). 


••• , Xi) ~ 6) 

UO t-1 


< Tj(x 1 , ■■■ ,Xj) 


for all 6 in D and almost all (xi, • • • , x,) in R 


b). / Tjixt ,Xj)dx i • • • cte,- 

J Rj 

is finite. 

(3.5). Let 

tm = f e*(x u ---, Xi ) Uf(x<, 0) dx<, O' - 1,2, •.•). 

t-1 

We postulate the uniform convergence of the series 

y' d£,(0) 

/ dd 


(the existence of 

do 


is a consequence of Assumption (3.4)) /or all 0 in D. 


4. The case of one parameter. In this section we assume that f(x, 6) depends 
on a single parameter 0. In sections 5 and 6 we shall discuss the case when 6 
is a vector with more than one component. 

We have E = 0 

00 

by (3.2). Define the chance variable 

Y n = £ _ *°g/fa» 
ft! 90 

By an argument almost identical with that of [1], Theorem 1, or of Theorem 7.1 
below, we have 

(4.1) 


EY n = 0. 



EFFICIENCY OF SEQUENTIAL ESTIMATES 


219 


From Theorem 7.2 below we obtain 

(4.2) c\Y n ) = EnE ( t log ^ X ' . 

Let 0* (xi, • • • , x n ) be an estimate of 0 such that 

EQ* = e + 6(0). 

Then 

(4.3) ib [ 9*(xi, ,Xj) n f(x> ,6)dXi = 6 + b(0). 

J-l JRi *-l 

Differentiation of both members of (4.3) with respect to 0 (Assumptions (3.4) 
and (3.5)) gives 

(4.4) E6*Y n = l+f e - 


From (4.1) it follows that (4.4) gives the covariance between 0* and Y n . 
from (4.2) 


(4.5) AO*) > (l + \EnE ( p^ j] \ 


Hence 


When the bias b(0) is constant, for example when b(0) s 0 in case 8* is an 
unbiased estimate, we have from (4.5) 


(4.6) 


c\e*) > [EnE l0g ^ (y ’ ~ Y\ '■ 


The equality sign in (4.6) will hold if 6* may be written as Z'(0)F» + Z"(0), 
where Z' and Z" are functions of 0. However, 0* itself should not be a function 
of 0 if our argument is to remain valid. The subject is connected with the 
question of the existence of a sufficient estimate. 

Let f(x y 0) be defined as follows: 

f(x, 0) = 0'(1 - e) 1- \ (x = 0 or 1; 0 < 0 < 1). 

Then 

d log f(x, 0) _ x _ (1 x) F /d log A 2 _ 1 

00 0 (1 - 0 ) ’ j \ ae J 0(1 — 0 ) * 

Suppose 0* is unbiased. Then <r 2 (0*) > 0(1 — d)(En)~\ a result first given by 
Girshick [3] under unspecified regularity conditions. 

Let the functions <pi , , • • • be such that n(o>) is a constant. We are then 

dealing with samples of fixed size. The result (4.5) is then given in [2], p. 480, 
under different conditions of regularity. 


5. Regularity conditions for the case when 0 has more than one component 
We suppose that 0 = (0i, • • • , 0j) and that simultaneous estimates 



220 


j. wolfowitz 


0*(xi , • • • , x n ), • • • , 0* fa , • • • , x n ) of the components of 0 are under discussion. 
In the sequel we shall limit ourselves to the case when these estimates are all 
unbiased. 

We postulate the following regularity conditions which are sufficient to validate 
section 6: 

(5.1) . The covariance matrix of the estimates 0* , • • • , 0* is non-singular for all 
6 in D (this time D is an open interval of the l-dimensional parameter space). 

(5.2) . The conditions of section 8 are satisfied for each 0* and 0* (i = 1, • • • , l). 

6 . The ellipsoid of concentration when 0 has more than one component. Let 

0 = (Oi , • • • , 0/)- 

We shall first describe briefly the result of Cramer [4] which refers to samples 
of fixed size n > l. Let 0*(x i, • • • , x n ) be an unbiased estimate of 
di , (i = 1, • • • , l). Let || Xij || be the non-singular covariance matrix of the 
0 * , and let || \ iJ || be its inverse. The “ellipsoid of concentration’’ in the space 
of points (ki , • • • , hi) is defined as 

(6.1) E X ,7 (fc - e,)(k, - di) = 1 + 2. 

I.7-1 

If a unit mass be distributed uniformly over this ellipsoid it will have the point 
(0i, • • • , 0/) as its center of gravity and as its product of inertia about the 
corresponding axes. Cramer proves that, subject to certain regularity condi¬ 
tions, there is a fixed ellipsoid 

i 

(6.2) E w(k ~ »<)(*# - 0 ,) = 1 + 2 

*.7-1 

where 

«- -w-#) 

which is always contained entirely within the concentration ellipsoid of any set 
of unbiased estimates. The two ellipsoids coincide only under certain condi¬ 
tions, among which is that the 0? be jointly sufficient estimates of the 0<. 

Let us now consider the sequential procedure of this paper and postulate the 
regularity conditions of section 5. Let 

K = ||*w || 

be a matrix with real elements such that | | = 1 and let 

= Hfc'Ml 

be its inverse. Let 



0i 

0i* 

^1 

IMI - 


• , 11*11- 

* 


0i 

0* 




EFFICIENCY OF SEQUENTIAL ESTIMATES 


221 


be column matrices. Suppose 

(6.3) 11*11= *11 fill- 

Then 

(6-4) II 011 = AT'II* I!- 

Define 

■ =K\\e*\\. 

From section 4 we have 

(6.5) EnE )* > lA&T 1 

where the differentiation by which ^ is obtained is performed with \p 2 , ■ ■ ■ *, *j 

d\y i 

held constant. Consider the last — 1) rows of K as fixed and (kn , & u , • • • , ku) 
as free to vary subject only to the restriction that | K | = 1. The left member 
of (0.5) is then a fixed quantity, while the right member is a function of the first 
row of K.. The inequality (6.5) must remain valid for all admissible 
(k n , * * • , ku). Hence (6.5) will remain valid if the right member of (6.5) is 
replaced by its maximum with respect to (kn , • • • , k u ) . We shall obtain this 
maximum and find that (6.5) then implies a result about the minimal ellipsoid 
of concentration. 

The problem is therefore to minimize a 2 (^f). Now 

( 6 . 6 ) <r 2 (^i) = lil'kijkukij. 

*./ 

The family of ellipsoids in the space of (kn , • • • , ku) 

(6.7) 22 ku k\j = c, 

i.j 

where c is a running parameter, has all centers located at the origin. Let 

(k° n , • * • , fa!) 

be the sought-for maximizing values of (k n , • • • , ku). From the definitions of 
K and K~ l we have 

(6.8) 

* 

where (k 11 , A; 21 , • • • , k 11 ) are constants. It follows that the minimum value 
Co of <r*(^f) is such that the ellipsoid 

(6.9) 22 K'kukij = Co 

is tangent to the hyperplane (6.8) at the point (k° n , • • • , ku). Now the tan¬ 
gent plane to (6.9) at this point is given by 

(6.10) 22 fa* fa/ 585 Cq. 




222 


J. WOLFOWITZ 


From (6.8) and (6.10) we obtain 


(6.11) 

Cok !l = E kuK, 
i 

U = i, • 

■•,0. 

Hence 




(6.12) 

Co Ex’ 7 A" = C 

f 

U = i, • 

”,l) 

from which 




(6.13) 

co Ex w *“* fl = 1. 

itj 



We have 




(6.14) 

<3 log/ V' d 1°8 / 

#i i ddt 

( <3 log /V _ V z.» iJi a log / 3 log / 

\ d*h ) <30, <30, • 




From (6.5), (6.13), (6.14), and the definition of c 0 we conclude that 
(6.15) E Mo k>' > E X*' k" k’ 1 

i.l I.i 


where 

(6.16) 

We may restate (6.15) as follows: The concentration ellipsoid 

(6.17) E x«(*, - ft)(Jb y - 0,) = l + 2 

of the unbiased estimates 0* , • • • , 0* always contains within itself the ellipsoid 

(6.18) E M«(fc “ ft)(*i - «/) = i + 2 

where the mo* are defined by (6.16). 

The question of the coincidence of the two ellipsoids is connected with the 
question of the existence of sufficient estimates. It may be difficult to state 
any general results about the concentration ellipsoid of biased estimates without 
postulating some relationships among the biases and/or their derivatives. 

7. On Wald’s equation and related results in sequential analysis. In sec¬ 
tion 4 we referred to a proof by Blackwell [1] of an equation due to Wald [5] 
which is fundamental in the Wald theory of sequential tests of statistical hypothe¬ 
ses. Here we shall give a perhaps simpler proof of this equation, and then prove 
several new and related results of general interest for sequential analysis. 

The results of Theorems 7.2 and 7.3 below can be obtained by differentiation 
of Wald’s fundamental identity of sequential analysis ([6], [7]). However, the 



EFFICIENCY OF SEQUENTIAL ESTIMATES 


223 


conditions under which we obtain these results are less stringent than any so far 
found sufficient to establish the identity and the validity of differentiating it. 
Theorem 7.4 and its corollaries refer to sequential processes where the chance 
variables may have different distributions or even be dependent. In the future 
we hope to return to the question of finding all central moments of Z n , the 
problem of generalizing the fundamental identity, and related questions. 

For Theorems 7.1, 7.2, and 7.3 we shall assume a ’chance variable X whose 
cumulative distribution function F(x ) is subject only to whatever restrictions 
may be explicitly imposed on it in each theorem. We assume the existence of a 
general sequential process such as is described above, which is subject only to 
such restrictions as may be explicitly formulated in each theorem. The sequen¬ 
tial process of course defines the chance variable n. Let £i ,**,•• • be suc- 

n 

cessive independent observations on X . We define Z n = 2Z £»* • If E(X) and 

«-i 

<r 2 (X) exist we shall denote them by w and <r 2 , respectively. 

Theorem 7.1 (Wald [5] , Blackwell [1 ]) . Suppose w and En exist. Then 

(7.1) E(Z n — nw ) = 0. 

The following tfteorem, which is a sort of partial converse of Theorem 7.1, is 
proved concomitantly with Theorem 7.1: 

Theorem 7.1.1. If EZ n exists, and if either P{X > 0} = 0orP{X < 0| = 0, 
then w and En both exist, and 

EZ n = wEn. 

Actually the same proof suffices for a somewhat stronger form of Theo¬ 
rem 7.1.1: 

Theorem 7.1.2. If EZ n exists, and if 

E(Xi | n - j) > 0 (or < 0) 

for all positive integral j such that P {n = j] 9* 0, and all i < j, then w and En 
both exist, and 

EZ n = t oEn . 

» \ 2 

X} I Xi — w | ) exists, then a 2 and En both exist, and 
»-i / 

(7.2) E(Z n — nw) 2 = <r 2 En . 

We have 

E(Z a - nw) = E (]t (*« - «>)) = T, J (i - w)) II dF(x { ) 

= ze/ (x,-w) n dFM. 

i-l t-j m—1 


Theorem 7.2. If E 


( 


(7.3) 



J. WOLFQWITZ 


(7-4) £ f (*,■ - w) ff dF(x m ) = P\n > j\ E(x, - w) = 0. 

|'a j Dial 


Hence 


e£/ u -^n^io-o. 

1 *— j m—1 


From this (7.1) follows. 

Suppose now that the conditions of Theorem 7.2 are fulfilled. We have 
E(Z n - nwf = £ [ (£ (x,- - w)Y ff dF(x n ) 

(7.6) = £ £ ( (*, - *-)’ ff dF(x n ) 

7—1 t—7 •'R,* m—1 


+ 2 £££ [ (x. - w)(xj - w) ff dF{x m ). 

7—2 «-l *—/ *'«, n»-l 


Let s < j be any two positive integers. Then 


Hence 


£ [ ( x, - w)(x,- - w) ff dF(x n ) = 0. 

•■■J 'fij m—1 

£££f (x. - w)(xy - w) ff </F(x m ) = 0. 

7—2 «—1 i —7 •'Ri m —l 


In a similar manner we obtain 

(7.9) £ [ (x, - wf II dF(x n ) = <r 2 P{n > j\. 

From (7.6), (7.8), and (7.9) it therefore follows that 

(7.10) E(Z n - nwf = <f £ P{n > if = <r 2 £jPfn = if = <fEn 

7-1 7-1 

which is the desired result. 

It remains to prove the validity of rearranging the series in (7.3) and (7.6). 
First, we have 

(7.11) £ f I — w I II dF(xm) = P{n > j\E | X - w |. 

«-7 J R% m—1 



EFFICIENCY OF SEQUENTIAL ESTIMATES 


225 


Hence it follows that 


(7.12) 


LEf I I n dFixJ ~t,Pln>j}E\X- 

J-l i-i JRi m-1 j—1 


W 


E | X - w I £ jP{» = i! - E I X - w I En. 


This justifies the rearrangement of terms in the series in (7.3). Second, the 
series (7.6) is dominated by the series 


(7.13) 


EE/ (*,' - «0* rf dF(x m ) 

j—1 «—j J R% m—1 


+ 2 EEE/ I x. - w I • I av - w I ri dF(x m ) 

2 i.l i_7 m—1 

all of whose terms are positive. The series (7.13) converges because 

(7.14) \x< - w|)* < +«o. 

Hence the rearrangement of the series (7.6) is valid. 

In the sequel we require certain sets R 'j(j = 1,2, * ) which we shall define 
now. Let R*,- , i < j , be the totality of all points (xi , • • • , Xj) such that 

(7.15) (zi,--, xd c Ri . 

Let R 3 be the ./-dimensional Euclidean space. Then 

(7.16) R'i = R* - £ R*i • 

t.1 

We shall now prove: 

Theorem 7.3. Suppose that E [§ I*. - w| J and En [s |x. - w|] 
exist? Then 

(7.17) 
where 


E(Z n — nw) z = WiEn + 3 a 2 En(Z n — nw) 
w z = E(X - w) z 


exists. 


2 The author has succeeded in proving that the existence of E £ | — w | J implies 

of E 1 Xi — w |J. The proof will be published subsequently in con- 


the existence 


nection with other results. 



226 


J. WOLFOWITE 


Proof: We have 


E(Z n - nw) % - 52 f \it, (Xi - w{\ II dF(x m ) 

/-1 Li-l J 

= z [ i,(x i -w)'iidF(x m ) 

jmm 1 •'Ey t—1 * m-il 

(7.18) + 3 £ f EEfe-«>)(*< - »)* II dF(^) 

y-a •'A/ t—2 0-1 m—l 

+ 8£f ±E (a:, — — w) II dFioCm) 

jmm2 JRj imm 2 0—1 *1-1 

+ »tf ±£S (x t — k>)(x, — w)(z* — w) II dF(x m ). 

jmm 8 •'Ey *—8 0—2 <—1 m—1 

Considering the first term in the right member of (7.18), it follows that 

Z/ \i(*< - 

j—i L J »*—1 

, 710 s = EE / (*. - wftl dF(*m) 

(7.19; »-l JRj Ifimml 

00 

= ^w z P{n > i } 

*-l 

00 

= ^2iw s P{n = i) = w z En. 

<-i 

All the rearrangements of terms in the operations involved in the proof of Theo¬ 
rem 7.3 are legitimate because the various series are absolutely convergent. 

As for the second term in the right member of (7.18), we have 

if zz (x, — w)(xi — w) 2 II dF(x m ) 

jmm2 •'Ey imm 2 0—1 fn—1 

non \ = I E E f (x. - w)(Xi - w)*n dF(x m ) 

\0 0—1 <—0 + 1 jmmi "Rj m—1 

= <x’ £ £ [ (x. - w)U dF(x n ) 

•mm 1 <— 0+1 J R'(-l mrnrnl 

= «*'£'£( (X. - «on dPM. 

0—1 <—0 JR* 4 mmml 

We now operate on En(Z n — nw) f and obtain 

En(Z n -nw) = £ ( it (*< - to) II dF(x a ) 

(7.21) ’- 1 J *i *- 1 "- 1 

££ f ifa - «0II dF(x m ). 

i -» i-i •'»< —1 



EFFICIENCY OF SEQUENTIAL ESTIMATES 


227 


We observe that 


(7.22) 


* r i 

EE / i(.Xj - w) n dF(x m ) 

*—/ R{ mrnrnl 

= j 2 [ (*, - w) II dF(x„) 

i-1 m-1 

+ E if (x,- - w) n dF(x„). 

*— y+1 t»l •'Hi m—1 

To evaluate the left member of (7.22), we proceed as follows: It is easy to see that 

(7.23) E f (*, -w)fl <1F (x m ) = 0. 

•-I* -1 

Moreover, when s > j, 

(7.24) E [ (Xi - w) n dF(x m ) = f (Xi - w) ff dF(x m ). 

{■I •'Rj m—1 jR' t — i rn_l 

Hence 

(7.25) E f i (*, -w)U dF{x m ) = E f (*i -w)U dF(x m ). 

%mmj JRi m— 1 tmmj JR', fftal 

Therefore 

(7.26) En(Z n ~ nw) = £±f. (X, - w) II dF(x m ). 

;-l tmmj Jr, m—1 

It remains now to consider the third term of the right member of (7.18). 
We have 


(7.27) 


E f 2 E 23 (*. - »)* fo - w) n dF(x„ 

j—2 j Rj »—2 *—l m—1 


= E E Z f (*, — »)* (Xi — to) II dF(xJ. 

tmml immt +1 J-t J H , ffltX 

Now, suppose that in the expression 

(7.28) = f (x 9 - w) 2 (x< - w) II dF(a^) 

m—1 

where j > i > », we integrate with respect to all for which m > i. Then 
it is not difficult to see that 

00 

(7.29) E = 0 

for all s and i such that l < s < i. Hence from (7.27) 

(7.30) E f £E(*. - w)* (Xi - w) fl dF(xm) - 0. 

j-1 •'By i-t i-l M-l 



228 


J. WOLFOWITZ 


In a similar way it is shown that the fourth term of the right member of (7.18) 
is zero. 

The desired result (7.17) is a direct consequence of (7.18), (7.19), (7.20), 
(7.26), and (7.30). 

Consider now an infinite sequence of chance variables Xi , x 2 , • • • , which 
need not have the same distribution and which may be dependent (in which 
case they must satisfy the obvious consistency relationships). We take suc¬ 
cessive observations on these chance variables and define a sequential process 
as above, which is subject only to such restrictions as we shall explicitly state. 
Let Z n maintain its previous definition. 

Theorem 7.4. Suppose that 

(7.31) = E(Xi | n>i) 

exists for aU positive integral i for which P [n > i\ j* 0. In those cases write 

(7.32) v\ = E(\ X t - ,<\ | n > t). 

Suppose also that the series 

(7.33) !>;+•••+ *')P{n = *•( 

*-l 

converges . Then 

Zn ~ §"*'] = °* 

It is regrettable but unavoidable that the mean values v» and v[ entering into 

(7.33) and (7.34) be conditional. The fundamental reason is that the sequential 
process may drastically modify the distribution of dependent chance variables, 
so that their distribution for our purposes can only be considered in conjunction 
with the sequential process itself. Consider the following example: 

P{X 1 = -1} - i, P{X 1 = 1} = h 

P\x, = -2 I Xx = -1} = i 

P{X t = -1 | Xx = -1) = i 

P{X t = 1 | X, - 1) = * 

P[X a - 2 | X, - 1} = l 

We have E(X t ) = 0. Suppose we define the following sequential process: 
IfX, = —1, n = 1, and if Xi = 1,» = 2. It is then clear that for our purposes 
X t can take no negative values and the fact that E(X j) = 0 is of no use to us. 


(7.34) 


E\ 



EFFICIENCY OF SEQUENTIAL ESTIMATES 


229 


If, however, the chance variables Xi , X *, • • • are independent, this difficulty 
disappears, and we have the following. S 

Corollary 1 to Theorem 7.4. If the chance variables X\ , X 2 , • • • are mde- 
pendent , we have Theorem 7.4 with Vi = E{X1), and v' { = E \ Xi — ^ | . 

If further all the X, have the same distribution, we sec that Theorem 7.1 is 
a special case of Theorem 7.4, since the convergence of the series (7.33) is then 
a consequence of the existence of w and En. From this argument we see, how¬ 
ever, that it is not necessary that all the Xi have the same distribution, and we 
may write the following generalization of Theorem 7.1 : 

Corollary 2 to Theorem 7.4. Let the Xi be independent with f in general , 
different distributions. Suppose , however , that all are equal , and all v\ are equals 
except perhaps for those i such that P {n> i} =0. Suppose further that En exists . 
Then (7.1) holds. 

Among possible fields of application of Theorem 7.4 are sequential tests of 
composite statistical hypotheses, and the random walk of a particle governed 
by probability distributions which are functions of time and the position of the 
particle. The extension of this theorem to vector chance variables is straight¬ 
forward. The extension to higher moments may present difficulties. We hope 
to return to some of these questions in the future. 

Proof of Theorem 7.4. This is very elementary. We have 

e(z. - g*) - g f K [g (*. - ,,)] dF{x ,, • • • , x f ) 

(7.35) = EE f fo - »,) dF(x i , • • • , x ( ). 

J-l *-J J Ri 


= f>ln > j\E(X, — vj\n> j ) = 0. 

J-l 


The rearrangement of the series is valid because 


(7.36) 


ii.l | Xi - Vj I dF(x 1 , , Xi ) = E v'iP\n > j } 

J-l t'-J JRi J-l 


oo 

= £(*; 


+ • • • + v'i)P{n = j) 


which converges by (7.33). 


REFERENCES 

[1] David Blackwell, “On an equation of Wald, ,, Annals of Math. Stat ., Vol. 17 (1946), 

pp. 84-87. 

[2] H. CramAr, Mathematical Methods of Statistics 1 Princeton Univ. Press, 1946. 

[3] M. A. Girshick, Frederick Mosteller, and L. J. Savage, “Unbiased estimates for 

certain binomial sampling problems,*' Annals of Math. Stat. f Vol. 17 (1946), 
pp. 13-23. 



230 


J. WOLFOWITZ 


[4] H. Cramer, “A contribution to the theory of statistical estimation," Skandinavisk 

Aktuarietidskrift, Vol. 29 (1946), pp. 86-94. 

[5] A. Wald, “Sequential tests of statistical hypotheses," Annals of Math. Stat., Vol. 16 

(1945), pp. 117-186. 

[6] A. Wald, “On cumulative sums of random variables," Annals of Math. Stat., Vol. 15 

(1944), pp. 283-296. 

[7] A. Wald, “Differentiation under the expectation sign of the fundamental identity in 

sequential analysis," Annals of Math. Stat., Vol. 17 (1946). 

[8] C. It. Hao, “Information and the accuracy attainable in the estimation of statistical 

parameters," Bull. Calcutta Math. Soc., Vol. 37, No. 3 (Sept.. 1945), pp. 81-91. 

[9] A. C. Aitken a>>d II. Silverstone “On the estimation of statistical parameters," 

Proc. Roy. Soc. Edinburgh , Vol. 61 (1941), pp. 56-62. 



ESTIMATION OF LINEAR FUNCTIONS OF CELL PROPORTIONS 

By John H. Smith 
Bureau of Labor Statistics 

Summary. In this article certain contributions are made to the theory of 
estimating linear functions of cell proportions in connection with the methods 
of (1) least squares, (2) minimum chi-square, and (3) maximum likelihood. 
Distinctions among these three methods made by previous writers arise out of 
(1) confusion concerning theoretical vs. practical weights, (2) neglect of effects 
of correlation between sampling errors, and (3) disagreement concerning methods 
of minimization. Throughout the paper the equivalence of these three methods 
from a practical point of view has been emphasized in order to facilitate the 
integration and adaptation of existing statistical techniques. To this end: 

1. The method of least squares as derived by Gauss in 1821-23 [6, pp. 224- 
228] in which weights in theory are chosen so as to minimize sampling variances 
is herein called the ideal method of least squares and the theoretical estimates 
are called ideal linear estimates. This approach avoids confusion between 
practical approximations and theoretical exact weights. 

2. The ideal method of least squares is applied to uncorrelated linear func¬ 
tions of correlated sample frequencies to determine the appropriate quantity 
to minimize in order to derive ideal linear estimates in sample-frequency prob¬ 
lems. This approach leads to a sum of squares of standardized uncorrelated 
linear functions of sampling errors in which statistics are to be substituted in 
numerators. 

3. A new elementary method is used to reduce the sum of squares in (2)— 
before substitution of statistics—to Pearson’s expression for chi-square. In 
this result, obtained without approximation, appropriate substitution of sta¬ 
tistics shows that the denominators of chi-square should be treated as constant 
parameters in the differentiation process in order to minimize chi-square in 
conformity with the ideal method of least squares. . 

4. The ideal method of minimum chi-square, derived in (3) as the sample- 
frequency form of the ideal method of least squares, yields ideal linear estimates 
in terms of the unknown parameters in the denominators of chi-square. When 
these parameters are estimated by successive approximations in such a way as 
to be consistent with statistics based on them, it is shown that the method of 
minimum chi-square leads to maximum likelihood statistics. 

5. An iterative method which converges to maximum likelihood estimates is 
developed for the case in which observations are cross-classified and first order 
totals are known. In comparison with Deming’s asymptotically efficient 
statistics, it is shown that, in a certain sense, maximum likelihood statistics 
are superior for any given value of n—especially in small samples. 

6. The method of proportional distribution of marginal adjustments is de- 

231 



232 


JOHN H. SMITH 


veloped. This method yields estimates of expected cell frequencies whose 
efficiency is 100 per cent when universe cell frequencies are proportional—a 
condition closely approximated in most practical surveys for which first order 
totals are available from complete censuses. Whether this favorable condition 
is satisfied or not, the method yields results which are easy to interpret and it 
has many computational advantages from the point of view of economy of time 
and effort. 

Throughout the article discussion is confined to the estimation of parameters 
whose relationships to cell proportions are linear. However, most of the results 
can be extended to the case of non-linear relationships, the necessary qualifica¬ 
tions being similar to those in curve-fitting problems when the function to be 
fitted is not linear in its parameters. In this case, of course, least squares esti¬ 
mates are not linear estimates. In particular, obvious extensions of the general 
proofs in sections 5 and 6 make them applicable to the non-linear case. Thus 
even when relationships are non-linear, it can be shown that the method of 
minimum chi-square is the sample-frequency form of the method of lea^t squares 
which leads (by means of appropriate successive approximations) to maximum 
likelihood statistics in sample-frequency problems. This principle which 
establishes the equivalence of the methods of least squares, minimum chi-square, 
and maximum likelihood greatly facilitates the integration and adaptation of 
existing techniques developed in connection with these important methods of 
estimation. 

1. Introduction. This article deals with problems of statistical estimation in 
which the parameters to be estimated are cell proportions or linear functions of 
them. A simple illustration of this type of problem is that of estimating p, 
the proportion of white men in a population classified by race and sex. Fom 
a sample of n persons selected at random from such a population, the desired 
proportion can be estimated by simply taking the sample proportion of white 
men as an estimate of the corresponding cell proportion in the population or 
universe. This estimate is unbiased for all possible values of p and its sampling 
variance is p(l — p)/n —assuming, for simplicity, that sampling is done with 
replacements. Whether a more accurate unbiased estimate of p can be derived 
depends on whether or not any other relevant information concerning the cell 
proportions in the universe is available. For example, it may be known that 
all of the white portion of the population is composed of married couples so that 
in the universe the number of white men is exactly equal to the number of white 
women. This knowledge implies that half the proportion of whites provides an 
unbiased estimate of p which is far more accurate than the sample proportion 
of white men. In fact, the sampling variance of half the proportion of whites 
is equal to (2p)(l — 2p)/4n—less than half the sampling variance of the pro¬ 
portion of white men. 

The term ideal linear estimate will be used to refer to any statistic which satis¬ 
fies the criteria of estimation implied by the foregoing discussion—that is, an 



ESTIMATION OF LINEAR FUNCTIONS 


233 


ideal linear esimate is any estimate which (1) is a linear function of the sample 
observations; (2) is recognizable as unbiased by the research worker; and (3) 
has minimum sampling variance among estimates which have properties (1) 
and (2). These important criteria of estimation will now be stated in more 
technical language. 

Let u \, n 2 , and n 8 represent the number of (1) white men, (2) white women 
and (3) non-white persons, respectively, in samples of n persons. Since any 
linear function with a constant term can be reduced to the homogeneous form 
by adding an appropriate multiple of the identity 

(1.1) til 4” W 2 4" tl 3 — M = 0, 

it is possible, without loss of generality, to confine attention to linear estimates 
of the form 

(1.2) T = aiih + cLtfiz -jr a 8 n 8 , 

which are recognizable as unbiased. In this example, the research worker is 
assumed to know that the cell proportions in the universe are 

(1.3) Pi, P*, P* = P, P. 1 - 2 p. 

Hence, absence of bias implies that the expected value of T 

(1.4) E(T) = ainpi 4- cwipz + ciitip 3 

— ( a i 4" ^2 — 2a 8 )np 4" tictt 
is identically equal to p; in other words, that 

(1.5) n(ai 4- a* — 2a s ) — 1=0, 
and 


na 8 = 0. 

The ideal linear esimate is derived by finding values of , a% , and a 8 which 
minimize the sampling variance of T subject to equations (1.5) as side condi¬ 
tions. 1 In this way it can be show r n that half the sample proportion of whites 
is actually the ideal linear estimate of p. For more general problems, the 
process of minimization of sampling variances with the aid of Lagrange multi¬ 
pliers involves expressions which are complicated algebraically. For this reason 
it is usually easier to derive ideal linear estimates of parameters which are linear 
functions of cell proportions by the ideal method of least squares which is 
presented in section 4. 

Like other least squares estimates, an ideal linear estimate of a linear function 
of cell proportions depends on ideal least squares weights. Since these weights 

1 In this example, it is possible to solve equations (1.5) for a? in terms of ai, drop sub¬ 
scripts, and substitute in the formula for the sampling variance of T to obtain a quadratic 
in a to be minimized. 



234 


JOHN H. SMITH 


are, in general, functions of variances and covariances of sample frequencies, 
the theoretical connotation of the term “ideal” makes it preferable to other 
terms such as “optimum” and “best.” In this connection it should be em¬ 
phasized that (1) the sampling variance of linear estimates is insensitive to 
small errors in estimating ideal weights, and (2) the process of deriving practical 
approximations to ideal linear estimates automatically provides maximum 
likelihood estimates of the ideal weights. Thus the estimation of weights is 
perfectly objective and the best practical approximations to ideal linear esti¬ 
mates are expressed in terms of sample observations. This degree of objec¬ 
tivity is rare in statistical estimation as a brief consideration of regression prob¬ 
lems will illustrate. 

In ordinary regression problems, the ideal weights are inversely proportional 
to error variances. It is usually necessary to draw upon past experience to 
estimate relative weights because satisfactory estimates of error variances 
are rarely available in terms of sample observations. From the present point 
of view, the widespread use of equal weights implies the subjective “assumption” 
that all error variances are equal. (Maximum likelihood estimates of regression 
coefficients require, in addition, the even more subjective assumption of nor¬ 
mality.) In spite of these (usually implicit) subjective assumptions, dis¬ 
cussions of optimum properties of least squares regression coefficients based on 
ideal weights in terms of unknown parameters are highly commendable because 
(1) sampling variance is not very sensitive to small errors in weights and (2) 
properties of theoretical ideal linear estimates furnish a simple basis for dis¬ 
cussion of the properties of practical statistics based on any reasonably good 
approximations to the exact ideal weights. In any case, it is important to 
know what the ideal weights are in terms of unknown parameters because 
research workers can make better estimates if they know what quantities should 
be estimated than they could otherwise. 

2. Estimation of a single parameter. In sample-frequency problems, least 
squares weights are rarely given explicitly or even implied by information 
available to the research worker. Since the hypothetical example used in 
Section 1 is a trivial special case from, this point of view, a more realistic ex¬ 
ample is presented in this section. Since the biological interpretation of this 
problem is presented in detail in all but the first of the many editions of Fisher’s 
well-known book [3] it is sufficient here to consider only the statistical problem. 
The four cell proportions are 

(2.1) Pi, V2 , Vz , P* = (2 + 0)/4, (1 - Q)/ 4, (1 - 0)/4, 0/4, 

and the parameter 0 is to be estimated from the set of sample frequencies 

(2.2) ni , W 2 , n 3 , n 4 = 1997, 906, 904, 32, 

obtained in a sample of n = 3839 selected at random from an infinite universe. 
Fisher considers five different statistics— Ti , T 2 , T 3 , T 4 , and T b —so it will 



ESTIMATION OF LINEAR FUNCTIONS 


235 


be convenient to use the symbol 7\ for the ideal linear estimate. Consider 
the class of linear unbiased estimates of the form 

(2.3) T = diTii 4 “ CLtfH 4“ &itli + , 

where absence of bias implies that 

(2.4) 2<ii 4- «2 4- 03 = 0 
and 


(ii — «2 — a 3 4 - «4 ~ 4 /n — 0 . 

Minimizing the sampling variance of T in equation (2.3) subject to side 
conditions based on equations (2.4) yields the ideal linear estimate r L\ defined 
by the equation 

(2.5) n( 1 4- 20) 1\ = 3 0n L - 3 On* - 3 0n z + (4 - d)n 4 . 

The exact sampling variance of 7’ 6 , 


( 2 . 6 ) 


20(1 - 0)(2 4- 0) 

n( 1 4 - 20 ) 


is used by Fisher as the asymptotic sampling a ariancc of any efficient estimate 
of 0 . The exact sampling variance of the ideal linear estimate is especially 
appropriate as the asymptotic sampling variance of the maximum likelihood 
estimate T 4 because r L\ is the limit of an iterative process designed to estimate 
7 7 e as closely as possible from sample (lata by using successive approximations 
to Te for 0 in equation (2.5). The limit of this process (which is, of course, 
only an approximation to Ta) can be obtained by substituting the symbol 74 
for both and 0 in equation (2.5) and solving the resulting quadratic equation 
which can be reduced to 


(2.7) nT\ — (ih — 2 n 2 — 2!n* — n 4 )T 4 — 2n 4 = 0, 

an equation which is identical, except for notation, with Fisher’s equation of 
maximum likelihood of which is the positive solution. 

The foregoing result is a comparatively simple illustration of the general 
principle that the maximum likelihood estimate of any linear function of cell 
proportions is the limit of an iterative process designed to approximate the 
corresponding linear estimate as closely as possible by means of sample fre¬ 
quencies. Since the accuracy of estimates of least squares relative weights 
increases with size of sample, maximum likelihood statistics have, in an asymp¬ 
totic sense for large samples, the same optimum properties w r hich are possessed 
in an exact sense (even for small samples) by the corresponding ideal linear 
estimates. Thus the results obtained by means of the theory of large samples 
are supported by the approach to estimation problems by means of ideal linear 
estimates. In addition, the later approach facilitates the integration of 
available techniques as explained in later sections. 



236 


JOHN H. SMITH 


It is true that the optimum properties of maximum likelihood statistics can 
be presented in terms of the theory of large samples, but the fact that a given 
method of estimation yields a statistic whose asymptotic sampling variance is 
a minimum does not imply that the same technique will yield a minimum 
variance statistic for any given small value of n. For example, it is well known 
that the median is a maximum likelihood estimate of the midpoint of a double 
exponential universe. Nevertheless, in samples of three observations from 
such a universe, another statistic—4/9 of the mean plus 5/9 of the median— 
has greater relative advantage over the median than the median has over the 
mean. 

Fisher’s discussion of the relative efficiencies of his five alternative consistent 
statistics suggests that it is impossible to formulate objective criteria for making 
choices among alternative statistics such that each statistic will be used whenever 
its sampling variance is smallest . Consider the sequence of universes generated 
by letting 0 vary from zero to unity. In general, each value of 0 would deter¬ 
mine which of Fisher’s five statistics would have smallest sampling variance 
for that particular universe for any given value of n. In comparison with 
any other single statistic, the statistic T* would usually have smaller sampling 
variance, but there are notable exceptions. For example, in the absence of 
linkage when 0 is equal to one-fourth, the statistic T 2 is the ideal linear estimate 
and its sampling variance is smaller than that of T 4 —at least for certain small 
values of n. For this reason, Fisher used T 2 in preference to T* as the basis for 
testing the significance of linkage. The statistic T b —derived by Fisher’s method 
of minimum chi-square—is also of special interest. Fisher’s method of minimum 
chi-square yields statistics which differ from the corresponding maximum 
likelihood statistics because Fisher considers the denominators as variables in 
the process of differentiation instead of considering them as unknown para¬ 
meters to be estimated by identifying them with the corresponding statistics 
in the numerators after differentiation. Arguments of later sections tend to 
show that the latter method is more appropriate. In this example, it can be 
shown that if T 5 were substituted for the corresponding parameter in the de¬ 
nominators of chi-square (and treated as a parameter) the minimization of chi- 
square with respect to statistics in its numerators would be exactly equivalent 
to substituting 0.035785, the numerical value of T b for 0 in equation (2.5) and 
solving for T 7 ® to obtain 0.035717, a value which is much closer to 0.035712, 
the numerical value of the maximum likelihood estimate T A than to Fisher’s T b . 
In problems of estimation chi-square should be minimized in order to obtain 
efficient statistics—not to obtain a small criterion for testing goodness of fit— 
and it should be minimized in a manner consistent with this purpose. Whether 
or not it is possible to derive an even smaller value for a quantity called chi- 
square should be considered to be irrelevant in either estimation problems or 
tests of significance. It is difficult to present these ideas in more technical 
language because it is possible to construct trivial hypothetical universes for 
which Fisher’s method of minimum chi-square provides statistics which are 



ESTIMATION OP LINEAR FUNCTIONS 


237 


superior in certain respects to the corresponding maximum likelihood statistics. 
Nevertheless, it seems clear that the ideal linear estimate usually has smaller 
sampling variance than the maximum likelihood statistic which, in turn, usually 
has smaller sampling variance than any other given practical statistic. Evi¬ 
dence presented in later sections tends to show that these advantages are more 
important in small samples than in cases in which the theory of large samples 
is applicable. 

3. The “ideal” method of least squares. When sample observations are 
uncorrelated in successive samples and parameters to be estimated are linear 
functions of the expected values of the sample observations, the method of least 
squares yields ideal linear estimates of the parametes provided that the weight of 
each observation is inversely proportional to its variance in successive samples. 
Although the minimum sampling variance property among linear unbiased 
estimates is seldom stressed, this principle of weighting has been presented in 
connection with the method of least squares for more than a hundred years. 
In order to emphasize the theoretical nature of weights which depend on vari¬ 
ances which are usually unknown in practice and to distinguish the method 
based on such weights from the more familiar method of least squares with 
equal weights, the method which yields ideal linear estimates will be called the 
ideal method of least squares. 

Discussion of the general problem of estimating linear functions of cell pro¬ 
portions can be facilitated by making use of results obtained by other writers— 
notably Gauss (as reported by Whittaker and Robinson [6]) and Pearson [4], 
According to Whittaker and Robinson, “the first writer to connect the method 
[of ideal least squares] with the theory of probability was Gauss” [6, p. 224]. 
In his Theoria Motus proof of 1809, Gauss derived the “most probable value” 
[6, p. 223] of a parameter (i.e., the statistic which satisfies the criterion now 
called maximum likelihood) for the case in which sample observations are sta¬ 
tistically independent and normally distributed. In his Theoria Combinationis 
proof of 1821-23, Gauss “abandoned the ‘metaphysical* basis” [6, p. 220] of 
his earlier work and derived the method herein called the ideal method of least 
squares (without approximation) from the criteria of (1) minimum variance and 
(2) absence of bias for the case in which “the mean value of [the covariance of 
a pair of errors] is zero” [6, p. 224]. Since the covariances of uncorrelated linear 
functions are zero whether they are statistically independent or not, it follows 
from the work of Gauss that the ideal method of least squares applied to un¬ 
correlated linear functions of sample frequencies yields ideal linear estimates. 
In other words, the ideal method of least squares implies the following six steps: 

1 . From the set of k + 1 sample frequencies construct k linear functions 
which are uncorrelated in successive samples. 

2 . From each function subtract its expected value in terms of the unknown 
parameters to find its sampling error. 



238 


JOHN H. SMITH 


3. Write the ratio of each sampling error to its own standard error in the' 
form of a fraction. 

4. Sum the squares of these standardized uncorrelated sampling errors to 
obtain a quantity called chi-square. 

5. Substitute statistics 2 for the parameters in the numerators of chi-square. 

6 . Minimize the sum of squares of residuals with respect to each statistic 
in turn (subject to appropriate side conditions in case linear functions 
not implied in preceding steps are known). 

This series of six steps can be summarized by the single statement that the 
function to minimize is the sum of squares of standardized uncorrelated resid¬ 
uals. Actually this statement is oversimplified because even though sampling 
errors are both uncorrelated and standardized, the corresponding residuals 
are, in general, neither standardized nor uncorrelated. 

4. Pearson’s expression for chi-square. As defined by Pearson [4], chi- 
square is the sum of squares of a set of k standardized uncorrelated linear func¬ 
tions of sampling errors in a set of k + 1 correlated sample frequencies. A set 
of k standardized uncorrelated linear functions can be constructed in an infinite 
number of ways, but each set can be obtained from any of the others by means 
of an orthogonal transformation. Thus the sum of squares is the same no 
matter what set is originally chosen. As his set of standardized uncorrelated 
linear functions, Pearson chose those determined by the axes of the correlation 
ellipse for which he gave the required sum of squares in terms of “minors” or 
cofactors of the correlation determinant of the first k sample frequencies. Pear¬ 
son reduced this complicated expression to the now’ familiar form 

*+i 

(4.1) x’=£ («. - npif/np,, 

t—1 

where p t is the proportion in the ith cell in the universe and n, is the frequency 
in the ith cell of a sample of n observations selected at random from an infinite 
universe (or with replacements from a finite universe). 

The widespread misunderstanding of the nature of chi-square seems to be 
based primarily on the facts that 

1 . Pearson’s rule for degrees of freedom is inadequate (see section 5), and 

2. Pearson’s expression for chi-square can be derived by approximate methods 
as well as by exact methods. 

Pearson’s derivation of the expression for chi-square by exact methods is suf¬ 
ficient to show that its derivation by approximate methods involves a paradox 
in which different sets of approximations offset each other; however, Pearson’s 
article is relatively inaccessible and, in addition, his algabraic reductions involve 

* It is convenient to call these variable symbols “statistics”; the quantities whose 
squares are summed, “residuals”; and the whole expression “chi-square,” even though, 
from a certain point of view, these terms are strictly applicable only after the minimiza¬ 
tion process. This usage should always be clear from its context. 



ESTIMATION OP LINEAR FUNCTIONS 


239 


the minors of a general determinant of the kth order. For these reasons, the 
following exact derivation is presented in terms of elementary algebra. 

Since the sum of squares is the same for any set of k standardized uncorrelated 
linear functions of the sampling errors in k + 1 correlated frequencies, a set should 
be chosen for which the algebraic reductions are as easy as possible. From this 
point of view a satisfactory set, which can be written in any of three forms, is 
given by 

(4.2) y x = p<n x+ - p x+ n x 


p»Ct+ 

= -pA- - (p, + p x +)e x 

where e x = n x — np t and i+ and i— refer to classes formed by combining all 
classes above the ith. class and below the ith class, respectively. 

By means of the known variances and covariances of the sample frequencies 
in expected value form, 

(4.3) E{e x ) = np x { 1 - p»), 
and 

(4.4) E(e t ej) = -np x p Jt 


it can be shown that the variance of y x is 

(4.5) E(y x ) = np x p x+ (p x + p t+ ), 

and, by using the third expression in equation (4.2) for y t and the second for 
y } , it can be shown that any pair of y } s are uncorrelated because 

(4.6) E(y t yj) = 0, (i < ;). 


Let z y represent the variable y x expressed in standard-deviation units. The 
square of this standardized uncorrelated linear function of correlated sampling 
errors can be written 


(4.7) 


^2 _ (p» P*+ 

* “ np t p t+ (p x + p»+) * 


It remains to show that Pearson’s expression for chi-square can be obtained 
by adding the k values of z x in succession. For this purpose it is convenient 
to define 


(4.8) 


2 V _L 
Xr = La — + 

«-i np x 


<v+ 

np r + 


obtained by combining all classes above the rth class. 

When r = k, the expression in equation (4.8) is the expression to be derived. 
It remains to show that xf is the sum of squares of k standardized uncorrelated 
linear functions of sampling errors; i.e., 



240 


JOHN H. SMITH 


(4.9) Xh = ]C • 

i-1 

For the first cell ei+ = —e x and p 1+ = 1 — pi. Hence yi reduces to the negative 
of the error in the first frequency and 

(4.10) Xi = e\/npi(l - pi) 

= el/npi + e\+/npu. (pi+ = 1 - pi), 

a special case expressed in the required form. The general case is established 
by showing that 

(4.11) Xh + 4 - Xr , 


or, alternatively, that 

-3 2 2 

Zr = Xr — Xr—1 

= e r /np r + el+/np r + - (e r + e r+ ) 2 /n(p r + p r+ ) 

(4 12) = Pr?r+)(Pr 4~ Pr+) Pr Pr+ (^r 2g r g r -f ~f~ ^r+) 

np r Pr+(Pr + Pr+) 

= Prgr-4- ~ 2p r pr+ere r + + Pr-f ^r = (Pr e r + ~ Pr+e r ) 2 
TiPrPr+iPr + Pr-1-) ’ 


thus establishing the derivation of Pearson^ expression for chi-square. 

When sampling is done without replacement each variance and covariance 
is multiplied by (N — n)/(N — 1) where N is the number of observations in 
the universe. Hence, chi-square for this case can be written 


(4.13) 


2 


X 


N - 1 K 
N — n i Ii np t ’ 


This expression shows that the factor involving sampling errors is the same 
whether sampling is done with replacement or without replacement. Hence, 
the derivation of least squares statistics is the same for either method of sampling, 
but sampling variances for the simpler case are multiplied by the factor (N — n)/ 
(N — 1) when sampling is done without replacement. 


6. The method of minimum chi-square. The derivation of Pearson’s ex¬ 
pression for chi-square completes first four steps of the ideal method of least 
squares outlined in section 3. Hence, the method of minimum chi-square is 
the sample-frequency form of the ideal method of least squares in which only 
two of the six steps refnain to be taken. 

In his original article [4] Pearson pointed out that the use of statistics instead 
of parameters would affect the value of chi-square but that such effects would 
usually be so small that no allowance need be made for them in connection with 
tests of significance. It is now well known that the average value of chi-square 



ESTIMATION OF LINEAR FUNCTIONS 


241 


is reduced approximately one unit for each parameter estimated from the sample, 
and that the main portion of this effect is on the numerators; i.e., in large samples 
the effect of substituting statistics for parameters in the denominators usually 
has a negligible effect on the value of chi-square. By confining the discussion 
to the case in which parameters are used in the denominators, it is possible to 
make simple exact statements concerning the main effects in terms of the number 
of squares of standardized uncorrelated linear functions—also known as the 
number of degrees of freedom and the mean value of chi-square. 

When the expected values in the numerators of chi-square can be expressed 
as linear functions of r algebraically independent parameters, ideal linear esti¬ 
mates of the r parameters are determined by substituting statistics for the r 
parameters and minimizing the resulting expression wth respect to each sta¬ 
tistic. In general, such a substitution of statistics for parameters in the numer¬ 
ators of chi-square reduces the number of degrees of freedom by one unit for 
every parameter estimated; that is, the appropriately minimized chi-square 
can be analyzed into k — r squares of standardized uncorrelated linear functions 
of sampling errors. 

The r ideal linear estimates are linear functions of the sample frequencies. 
Let (t>i , v 2 , • • •, v r ) be a set of standardized uncorrelated linear functions of 
the correlated sampling errors in these statistics and let (vi, v 2 , • • •, ) be a set 

of linear functions obtained from the zf s of section 3 by an orthogonal trans¬ 
formation. Since the sum of squares is not changed by such a transformation, 
chi-square is the sum of the k values of v] . The process of substituting statis¬ 
tics for the r parameters in the numerators of chi-square reduces the values of 
the first rv * 9 s to zero without affecting the values of the other (fc — r^s. 

Thus the appropriately minimized chi-square can be analyzed into k — r 
squares of standardized uncorrelated linear functions of sampling errors and is 
therefore said to have k — r degrees of freedom. The mean value of each square 
is the variance of a standardized linear function of sampling errors and is there¬ 
fore unity by definition. Hence the mean value of the appropriately minimized 
chi-square (with parameters in the denominators) is exactly k — r when r 
statistics are estimated from a set of k + 1 sample frequencies. 

The expression to be minimized is 


(5J) 


0 rij - m -) 2 
npi 


where m\ is the ideal linear estimate of np,. The set of statistics described 
by the equation 

(5.2) Vfl{ = Tit f 


reduces the value of chi-square to zero—its minimum value. This shows that 
the sample cell proportion is the ideal linear estimate of the corresponding 
parameter. 

Whenever a linear function independent of the sum of the cell proportions is 



242 


JOHN H. SMITH 


known, it is possible to take advantage of additional information provided by 
the known function by minimizing chi-square subject to an appropriate side 
condition. When side conditions are used in this way, the number of degrees 
of freedom for the minimized chi-square is equal to the number of side conditions 
which are algebraically independent of each other (and of the sum of the cell 
proportions). Let the known linear function be written 

(5.3) 'Zajnpi — m = 0. 

In order to facilitate comparison of the typical equation of maximization 
with the corresponding equation of the method of maximum likelihood, it is 
convenient to minimize chi-square by maximizing — x 2 /2 subject to a side 
condition based on (5.3). The function to be maximized can be written 

(5.4) — x 2 /2 = S(rii — m'*) 2 /(“2ttp t ) + hiZatm'i — m), 

where h is a Lagrange multiplier. Setting the partial derivative of —\/2 
with respect to m'i equal to zero, the typical equation for minimizing chi-square 
can be written 

(y>.5) (iii - m'i)/npi + ha,i = 0, 

a form which shows that, in general, ideal linear estimates are defined in terms 
of unknown parameters. Fortunately, these parameters can usually be approxi¬ 
mated closely by an iterative process. Substituting m< for both npi and m'i 
in equations (5.5) the typical equation in the limiting values of such a process 
can be reduced to 

(5.6) iii/mi — 1 + hdi = 0, 

a form which is identical with the typical equation (6.6) of maximum likelihood 
derived in section 6. This equality of typical equations implies that whenever 
the denominators of chi-square are estimated in such a way as to be consistent 
with least squares statistics based on them, the method of minimum chi-square 
always leads (by means of approximations necessary in practice) to maximum 
likelihood estimates of parameters which are linear functions of cell proportions. 

6. The method of maximum likelihood. Maximum likelihood estimates of 
linear functions of cell proportions can be obtained by (1) expressing the prob¬ 
ability function (general term of the multinomial expansion) in terms of the r 
parameters to be estimated; (2) substituting r statistics for the r parameters; 
and (3) maximizing with respect to the r statistics. In practice, this is usually 
accomplished by maximizing the logarithm of the variable factor in step (3) 
which can be written, 

(6.1) L = 2n<logra,-, 

where mi is the maximum likelihood estimate of npi , the expected value of the 
ith frequency n f in a sample of n observations classified into (k + 1) classes or 



ESTIMATION OF LINEAR FUNCTIONS 


243 


cells. It is evident that L as written has no maximum with respect to any m, 
since it increases without bound as m * increases, but it sometimes has a uniquely 
determined maximum when each of the m t ’s is written explicitly in terms of 
less than k -f 1 algebraically independent statistics. In the general case it is 
easier to maximize L subject to an appropriate set of side conditions, one of 
which must be equivalent to 

(6.2) rri\ + m* + • • • + mk+i —* n = 0. 

When no linear function except the sum is known, the likelihood function 
can be written 

(6.3) L = 2n, logra, — (2m, — n), 

a function which, subject to equation (6.2), is always equal to that in equation 
(6.1) but which has a uniquely determined maximum. The typical equation of 
maximum likelihood, obtained by setting the partial derivative of L with respect 
to m, equal to zero, is 

(6.4) n,/m, — 1=0, 

an equation which shows that each sample frequency is a maximum likelihood 
estimate of its own expected value. 

When a linear function such as that in equation (5.3) is known, an improved 
set of maximum likelihood statistics can be found by maximizing 

(6.5) L = 2n, log m, — (2m* — ft) + /i(2a,-m, — m). 

The typical equation of maximization is found to be 

(6.6) n,/m, — l + ha t = 0, 

an equation which, as stated above, is identical with equation (5.5). Since 
equation (5.5) w r as obtained as the limit of an iterative process from the typical 
equation (5.4) for minimizing chi-square subject to the same side condition 
and since each additional side condition affects the typical equation of each 
method in exactly the same way, the method of minimum chi-square and the 
method of maximum likelihood are equivalent for the general case in the sense 
that the method of minimum chi-square always leads to maximum likelihood 
statistics as limits of an iterative process. 

7. Second-order tables with known expected marginal totals. As stated in 
section 2, the integration of available techniques is facilitated by regarding 
maximum likelihood statistics as the best practical approximations to the 
corresponding ideal linear estimates. Since this important principle may not 
be immediately obvious, it will be illustrated for the important special case of 
second-order tables for which the expected marginal totals are known. 

Consider a sample of n observations arranged on two bases of classification 
and presented in a table containing r rows and s columns. The universe of N 



244 


JOHN H. SMITH 


observations has been completely enumerated and classified on each basis 
separately but not cross-classified; i.e., universe totals of first order classes are 
known. 

For the cell in the rth row and the jth column, let p,/ represent the universe 
cell proportion; n*/, the sample frequency; rip*/, the expected value of w*/; 
and mijy the maximum likelihood estimate of rip,/. Indicating summation 
by substituting a dot for the letter over which summation is to be performed, 
the known marginal totals satisfy the equations 

(7.1) Npi. - N im = 0, 

N VJ - N = 0, 

where p*. and p./ are the universe proportions and Ni. and Nj are the known 
universe totals in the tth row and the jth column, respectively. 

When n observations of a random sample are arranged according to two 
bases of classification in a table with r rows and s columns for which the r + * 
marginal totals are known, the typical equation of maximum likelihood can 
be obtained by maximizing, subject to side conditions based on equations (7.1), 
the likelihood function 

(7.2) L = 22n*/logra*/ — 2ai(m,-. — n,-.) — 2 6/(m./ — n. ; ), 

with respect to the maximum likelihood estimates m t7 , where a, and 5/ are typical 
Lagrange multipliers. Setting the partial derivative with respect to m*/ equal 
to zero and transposing, the typical equation of maximum likelihood can be 
written 

(7.3) 7iij/viij — a* 6/. 

Since equations (7.3) are not linear in their unknowns, the reader’s first 
reaction might well be to agree with a certain anonymous critic that “their 
solution is difficult.” This impression of great difficulty is probably the chief 
reason that previous writers have not used the method of maximum likelihood 
for this type of problem even after they had developed a set of techniques ade¬ 
quate for the solution of the equations of maximum likelihood. In other words, 
all that was needed was the integration of available techniques as will now 
be shown. 

In 1940, Deming and Stephan [2J derived a set of normal equations for the 
adjustment of a set of second-order cell frequencies to known expected marginal 
totals by the method of least squares in which each sample frequency is weighted 
by its own reciprocal. This method yields statistics which are efficient according 
to the theory of large samples, but they do not satisfy the criterion of maximum 
likelihood exactly. In the same article was presented an easier method of 
iterative proportions , which, unfortunately, does not yield least squares sta¬ 
tistics. In 1942, Stephan [5] developed an improved iterative process which 
yields statistics which satisfy the criterion of least squares with arbitrarily 



ESTIMATION OF LINEAR FUNCTIONS 


245 


chosen weights. The foregoing developments are presented in greater detail 
in Deming’s book [1] in which Deming adapts Stephan’s iterative method to 
the particular case in which each sample frequency is weighted by its own 
reciprocal so as to yield solutions for the normal equations derived in the joint 
article [2], 

In Deming’s notation, equation 8 of Stephan’s article [5, p. 169] can be written 
(7.4) niij = + qj — 1) + w,y, 

an expression obtained by substituting c</ for np^ in the denominators of chi- 
square and minimizing with respect to the statistics in the numerators. Hence, 
if exact values of the np<y were used for the Cij , the Stephan iterative method 
would yield ideal linear estimates. Unless these parameters are implied by 
some hypothesis to be tested, it is necessary, in practice, to estimate the npa 
from sample data. In order to secure maximum likelihood estimates of expected 
cell frequencies by means of the Stephan iterative method, the adjusted fre¬ 
quencies based on first approximations to the c tJ - should be used as second ap¬ 
proximations to the c xj , etc. In this way, maximum likelihood statistics can 
be derived to any desired degree of approximation. At this point it should 
be emphasized that the preceding statement applies not only to the class of 
problems considered in this section but also to the wider class of problems for 
which the Stephan iterative method provides solutions. 

Unfortunately, theoretical discussions of previous writers contain confusing 
compensating errors which (1) present their own methods in an unnecessarily 
unfavorable light and (2) increase the difficulties involved in the introduction 
of the improvements in techniques suggested in section 9 which involve some 
degree of adaptation of techniques already available. For these reasons, it 
seems necessary to follow the arguments of previous writers in order to show 
the points at which improvements are needed. This can be done most effec¬ 
tively in connection with Deming’s book [1] where the method of least squares 
is presented in great detail. 

For the special case in which the sampling errors in the observations are un¬ 
correlated, the ideal criterion of least squares implies that the weight of each 
observation should be inversely proportional to its sampling variance. This 
criterion is accepted as well known by Deming who says that “the principle of 
least squares requires the minimizing of the sum of the weighted squares of the 
residuals” [l, p. 14] where “the weights of two functions are inversely pro¬ 
portional to their variances” [1, p. 22], Deming assumes that “there is no 
correlation between the errors in the observations” with the qualification that 
“this assumption covers a wide class of problems, but does fail to cover some.” 
[1, p. 49]. This assumption of uncorrelated errors is not applicable to sample- 
frequency problems, of course, because the sample frequencies are correlated 
with each other in such a way that the reciprocals of the ideal least squares 
weights are not proportional to the sampling variances y but rather to 
the expected frequencies np i} which appear in the denominators of chi-square. 



246 


JOHN H. SMITH 


In this connection it is interesting to note that Deming himself insists that 
“there is only one principle of least squares, namely, the minimizing of x 2 *” 
[1, p. 51]. However, the method currently in use for the minimizing of chi-square 
was that given by Fisher [3] which leads to equations which are difficult to solve 
even for such a simple example as the one presented in section 2 above. 

Deming and Stephan are to be commended for seeking an easier method 
but there is no justification (even as a device for saving effort) for their modifica¬ 
tion of the “principle of least squares” so as to imply erroneously that 

(1) weights of correlated sample frequencies are inversely proportional to 
their variances, and 

(2) sample frequencies are, in general, approximately proportional to their 
own sampling variances. 

Strangely enough, these two errors were applied in combination by Deming and 
Stephan to obtain good practical approximations to the ideal least squares 
weights. It might be argued that the second misleading implication is really 
not an error because it is offered as a simplifying approximation, but it is an 
integral part of both the normal equations approach in the joint article [2] 
and Deming’s adaptation [1] of the Stephan iterative method; that is, in each 
case the method would have to be revised if better approximations to the ideal 
least squares weights were used. More explicitly, Deming (1) uses n» ; for Ste¬ 
phan’s Cij in equation (7.4); (2) identifies it with the other n ti in the same equa¬ 
tion; and (3) reduces the equation to a different form thus effectively preventing 
the use of successive approximations to the c l7 - without returning to Stephan’s 
iterative method in the general form given by equation (7.4) above which 
Deming does not present at all. Results of the joint article [2] are quoted by 
Stephan [5] without any explanation of the nature of the errors, but none of 
these results are used in the development of his iterative method which as noted 
above, is applicable to any arbitrarily chosen set of weights. The fact that 
Stephan corrected the second error without correcting the first implies that the 
weights he actually used are unsatisfactory. In Deming’s adaptation of the 
Stephan iterative method, a much better set of weights is obtained, not by cor¬ 
recting the first offsetting error overlooked by Stephan, but by resurrecting the 
second offsetting error which Stephan had corrected. Since this error is an 
integral part of Deming’s adaptation, Deming’s theoretical discussion implies 
that his own efficient statistics are only rough approximations which are definitely 
inferior to the inefficient statistics obtained by means of the weights chosen by 
Stephan. These inconsistencies are most clearly brought out by Deming when 
he says: 

“Strictly, in random sampling, the reciprocal of the weight of n<, is npaqa , which is 
nearly equal to where p and q have their usual connotations. But since factors pro¬ 
portional to the weights may be substituted for them, it is sufficient to use n<,- as the re¬ 
ciprocal of the weight in cell ij, since the values of qa do not usually vary much over the 
table.’’ [1, p. 102.] 

In any given problem the seriousness of the error in the first statement in 
the foregoing quotation depends on the variation among the s. In the par- 



ESTIMATION OP LINEAR FUNCTIONS 


247 


ticular example used by Deming the error is of considerable importance because 
the largest q is more than 40 per cent larger than the smallest qtj . The weights 
actually used by Deming agree with weights implied by the ideal method of least 
squares except for sampling errors in the ; hence, the error in any relative 
weight converges stochastically to zero so that Deming’s statistics are efficient 
according to the theory of large samples. The efficiency of Deming’s statistics 
is inconsistent with the theory presented by Deming which implies erroneously 
that efficiency of estimation depends on approximate equality of cell proportions. 
If this argument were true it would apply also to the method of maximum 
likelihood and all other methods which yield efficient practical statistics in 
sample-frequency problems. The foregoing discussion, together with the results 
of section 8 show that the theory as presented by Deming has the following 
seriously misleading features: 

(1) it is based on a paradox in which a good final result is obtained by means 
of compensating errors; 

(2) it presents his efficient statistics in an unnecessarily unfavorable light; 

(3) it emphasizes the irrelevant condition of approximate equality of universe 
cell proportions; 

(4) it fails to mention the important condition of proportionality by rows 
and columns; and 

(5) it makes least squares, minimum chi-square, and maximum likelihood 
seem to be competing alternative methods. 

Of these undesirable characteristics, the last two are probably the most serious 
because they make the effective integration and adaptation of statistical tech¬ 
niques more difficult. As has been shown in sections 4, 5, and 6, the sample- 
frequency form of the ideal method of least squares is the method of minimum 
chi-square which always leads (by means of appropriate practical approxima¬ 
tions to unknown weights) to maximum likelihood statistics; in other words, 
the methods are equivalent from a practical point of view. 

Since the ideal method of least squares based on the unknown np^ determines 
fully efficient, but theoretical, ideal linear estimates, the efficiency of practical 
approximations to ideal linear estimates depends on the accuracy with which 
the denominators of chi-square are estimated. For the unknown denominators 
npn , Deming uses the sample frequencies n tJ while the method of maximum 
likelihood implies the use of the corresponding maximum likelihood estimates— 
statistics which, in general, have smaller sampling variances. The foregoing 
argument suggests that maximum likelihood statistics are slightly superior to 
Deming’s statistics for any given finite value of n and that their relative ad¬ 
vantage increases as the sample size decreases. In large samples both methods 
yield efficient statistics because the relative errors in the weights implied by 
either method converge stochastically to zero as n increases. Although the ad¬ 
vantage of maximum likelihood statistics over Deming’s statistics is unim¬ 
portant except in small samples, it can be shown that Deming’s choice of weights 
leads to imperfectly compensated negative errors of estimation even in his 
large sample of 33,837 observations. 



248 


JOHN H. SMITH 


Deming weights each sample frequency by its own reciprocal. Positive errors 
of sampling decrease the value of the reciprocal and thus increase the absolute 
size of the required negative adjustments. Negative errors of sampling increase 
the value of the reciprocal and thus decrease the size of the positive adjustment. 
Thus every error of sampling (either positive or negative) leads to a negative 
error of estimation due to inappropriate weighting. Because the sum of all 
adjustments must be zero, these negative errors of estimation are compensated 
on the average but more or less imperfectly. The net effect of this imperfect 
compensation of negative errors of estimation is that Deming’s statistics are 
too small in those cells in which the relative adjustments (either positive or 
negative) are large, and vice versa. In a preliminary draft of this article, 
this type of error of estimation was studied by comparing Deming’s statistics 
with the corresponding maximum likelihood statistics in conection with Deming\s 
example involving 33,837 observations. Although errors of estimation of the 
type under discussion are apparent, they are, of course, extremely small in such 
a large sample. For this reason the large-sample comparson has been deleted 
in favor of simple hypothetical examples designed to throw light on similar errors 
of estimation in statistics derived by Fisher’s method of minimum chi-square 
as well as in those derived by Deming’s adaptation of Stephan’s iterative 
method. 

Consider a set of sample frequencies in a two-by-two table for which all 
expected marginal totals are equal. For this special case, the cell proportions 
on each diagonal are equal and the ideal linear estimate (which is also the 
maximum likelihood estimate) of any cell proportion is the mean of the two 
sample cell proportions on its diagonal. For the same case, Deming’s adaptation 
of the Stephan iterative method yields an estimate for each cell w T hich is pro¬ 
portional to the harmonic mean of sample proportions on its diagonal w T hile 
Fisher’s method of minimum chi-square yields estimates proportional to the 
corresponding quadratic means. 

As a numerical example of the foregoing problem consider the set of fre¬ 
quencies 

(7.5) nn , n J2 , nai , ri 22 = 1,4, 3, 2, 

obtained in a sample of 10 observations selected at random from a universe 
in which the cell poportions are known to be 

(7.6) pn , pn , p 2 i , P22 = p, 0.5 - p, 0.5 - p, p. 

As estimates of the parameter p, the ideal linear estimate is .15, Deming’s 
adaptation of the Stephan iterative method yields .14, and Fisher’s method of 
minimum chi-square yields .1545 to four decimal places, the other two estimates 
being exact. The results illustrate the imperfectly compensated errors of 
estimation explained previously. The two sample frequencies on the principal 
diagonal (n u and n^) have greater relative dispersion than the frequencies on 



ESTIMATION OF LINEAR FUNCTIONS 


249 


the other diagonal. For this reason, the relative adjustments made by Deming’s 
method are greater and according to the principle of imperfectly compensated 
negative errors of estimation, the estimate of p obtained by Deming’s method 
is smaller than the ideal linear estimate of p. Fisher’s method of minimum 
chi-square yields an estimate of p which is greater than the ideal linear estimate. 
In fact, one should usually expect imperfectly compensated errors of estimation 
in statistics derived by Fisher’s method of minimum chi-square to be opposite in 
sign and about half as large as those in the corresponding statistics derived by 
means of Deming’s adaptation of the Stephan iterative method. 

At this point, it should be emphasized that Fisher does not recommend his 
own method of minimum chi-square in preference to the method of maximum 
likelihood. In fact, he presents the theory of estimation in such a way as to 
imply correctly that the method of maximum likelihood is superior, especially 
in small samples. Other writers have noted the small differences between 
equations of maximum likelihood and those for minimizing chi-square by Fisher’s 
method and some have even derived one set of equations from the other by 
neglecting higher order terms in a Taylor series expansion. These derivations 
are of no interest here because they seem to justify the method of maximum 
likelihood as a simple approximation to some more complicated method. This 
type of justification is both unnecessary and undesirable. It is more useful to 
regard the method of maximum likelihood as an approximation to a method— 
least squares—for which the theory is simpler. 

Skeptical readers who find the foregoing argument unconvincing may be able 
to profit from the following example. Consider the problem of estimating the 
parameter p where 2 p is the proportion of white balls in an urn. A sample of 10 
balls is selected and classified by the following process. Each white ball is 
placed in one of the cells on the principal diagonal of a two-by-two table, the 
particular cell being decided by the toss of a coin. A similar method is used for 
non-white balls placed in cells on the other diagonal. Assuming that the results 
of this process are given by equation (7.5), which of the three alternative esti¬ 
mates of p given above should be preferred? Belief in the general superiority 
of Fisher’s method of minimum chi-square seems to imply that the device of 
coin-tossing described in this example can be used in practical problems involving 
the estimation of the proportion of “successes” to secure estimates which are 
superior to the sample proportion—the ideal linear estimate in such cases. 
Even if it is possible to construct trivial special case examples supporting some 
complicated method for such problems the general use in practical problems of 
the coin-tossing device in connection with either Fisher’s method of minimum 
chi-square or Deming’s adaptation of the Stephan iterative method would be 
absurd as this example is intended to emphasize. 

8. The method of proportional distribution of marginal adjustments. The 

method of proportional distribution of marginal adjustments is a general method 
of adjusting sample frequencies so that their row and column totals agree with 



250 


JOHN H. SMITH 


known expected marginal totals. In other words, the adjusted frequency for 
the cell in the tth row and the yth column is given by the equation 


(8.1) 

m*n = nu — pi.d.j — p.j di., 

where 

di. = rrn. — n,-., 

and 

d.i = m.i - n.j. 


are the net adjustments in the sample cell frequencies of the zth row and the 
jth column, respectively. The asterisk is used to distinguish maximum likeli¬ 
hood estimates and the ideal linear estimates nt'a from the set of statistics 
based on equation (8.1). 

The method of proportional distribution of marginal adjustments yields ideal 
linear estimates when the universe cell proportions are proportional by rows and 
by columns; i.e., when 

(8.2) pa = Pi.p.j. 

This important principle can be established by substituting in equation (7.4) 
of section 7 the quantities 

(8.3) cn = npi.pj , 

Pi = 0.5 + dijnpi ., 

and 


Qi = 0.5 + d.j/np.j , 

and reducing the typical equation of the ideal method of minimum chi-square 
to the form of equation (8.1) which defines the method of proportional dis¬ 
tribution of marginal adjustments. 

Even in the absence of exact proportionality, under which it yields fully 
efficient statistics, the method of proportional distribution of marginal adjust¬ 
ments has the following relative advantages over other available methods: 

(1) ease of extension to tables of higher order; 

(2) exact agreement with known (expected) marginal totals; 

(3) simplicity of interpretation; 

(4) independence of computational errors; 

(5) rapidity of processing; 

(6) economy of effort; and 

(7) fully efficient criteria for testing the significance of departures from 
proportionality of rows and columns. 

Ease of extension to tables of higher order is a desirable property of the 
method of proportional distribution of marginal adjustments. Equation (8.1) 



ESTIMATION OF LINEAR FUNCTIONS 


251 


applies to the special case in which there are only two bases of classification. 
In the more general case sample observations are cross-classified according to 
r bases of classification, each cell frequency in an rth order table being the num¬ 
ber of observations in the corresponding rth order class whose expected value 
is to be estimated. The required adjustment for each first order class (obtained 
by subtracting the sample total from its known expected value) is distributed 
among the various cells in proportion to the universe totals of the corresponding 
(r — l)th order classes to which the cells belong. The general process is il¬ 
lustrated by 

(8.4) m*jk = tiijk + Pi..djk + p.j.dt.k + p..ndi y., 

the formula for estimating the expected frequency in the general cell of a third 
order table. 

Exact agreement with marginal totals follows easily from the method of 
proportional distribution and can be established algebraically by summing the 
estimation equation by first order classes; e.g., summing equation (8.1) byrows 
and columns. In practice, discrepancies are always either errors of rounding 
or mistakes in computation ; they are never due to lack of convergence of iterative 
processes as is often true in alternative methods of estimation. 

Although simplicity of interpretation is desirable in general, it is especially 
important when random sampling is an unrealistic abstraction. For example, 
the method of proportional distribution of marginal adjustments has been used 
to estimate the cell proportions in a two-way classification of incomes from known 
marginal proportions and a detailed cross classification at an earlier date. In 
this problem known shifts in income distributions made it evident that certain 
cells previously vacant should not have the zero proportions which would be 
estimated for them by other available methods of estimation. The ease with 
which the effects of the method of adjustment can be traced is important also 
in the analysis of the results of sample surveys in which various types of bias 
are important. 

The method of proportional distribution of marginal adjustments yields the 
estimated expected frequency for any cell by a single sequence of computations 
which is independent of the corresponding process for any other cell. Errors 
made in computing the estimate for any cell appear in marginal totals of esti¬ 
mates for all first order classes which include that cell. If only a few errors are 
made in a table they can be localized immediately and can be corrected without 
recomputing any estimates which are correct. 

In certain types of social surveys, rapidity of processing is so important that, 
as Deming puts it, “the delay of only the brief time required for adjustment 
may not be advisable. ,, [1, p. 102]. Under these conditions, it is important to 
have a simple formula like equation (8.1) in which substitutions can be made 
rapidly. Even when the time element is relatively unimportant, the economy 
of effort and the ease of explaining the method to clerical assistants are often 
of practical importance. 



252 


JOHN H. SMITH 


Finally, departures from proportionality among rows and columns often 
provide the chief element of interest in research studies—not only in social 
surveys of the type illustrated in Deming and Stephan’s example but also in 
biological sciences. The most effective tests of significance for the purpose of 
presenting statistical evidence of lack of proportionality are those based on 
statistics like those derived by the method of proportional distribution of marginal 
adjustments whose efficiency is 100 per cent when proportionality is exact. 

Even when proportionality is not exact, the efficiency of statistics derived 
by proportional distribution may be close to 100 per cent under fairly typical 
problem conditions such as those in the example by Deming and Stephan wherein 
the other more complicated methods require several times as much computational 
effort, but have little advantage over the easier method with respect to effi¬ 
ciency of estimation in this particular problem. 

9. Suggested improvements in techniques. In section 7, a method was 
outlined by which it is possible to derive sets of maximum likelihood statistics 
by merely integrating available techniques without changing any of them. 
In this section a number of improvements are suggested. At this point it should 
be emphasized that a given change is not an improvement merely because it 
yields slightly more accurate estimates or makes possible a slight saving of 
time and effort. In each case the research worker should consider saving of time 
and effort and accuracy of estimation simultaneously. In particular, it seems 
likely that most social surveys of the type considered by Deming and Stephan 
are characterized by approximate proportionality by rows and by columns— 
conditions relatively favorable to the simple method of proportional distribu¬ 
tion of marginal adjustments. It should be clearly understood that sug¬ 
gestions in this section are intended for those research workers whose problems 
justify a great deal more effort than is required to adjust sample frequencies 
by this simple method. 

Assuming that the problem at hand warrants the effort required to derive 
maximum likelihood estimates, the first consideration is the derivation of a 
set of Wi/l), first approximations to the m<y, and a set of values of p<(l), 
first approximations to the . Even if proportionality by rows and by columns 
is not closely approximated use of values of the p<(l) provided by equation (8.3) 
are especially to be recommended. In the example used by Deming these 
values for the p t (l) are so much better than the values recommended by Deming 
that they save a large proportion of the effort required by the iterative process. 
If rows and columns are approximately proportional, equation (8.1) should be 
used to provide values of the w t /l), in which case it is possible to use an itera¬ 
tive process similar to the one used by Deming but based on the typical equa¬ 
tion of maximum likelihood (7.3) to achieve a given degree of accuracy in the 
maximum likelihood estimates with even less effort. Under favorable conditions 
such as those in Deming’s example the suggested iterative process yields excellent 



ESTIMATION OF LINEAR FUNCTIONS 


263 


approximations to maximum likelihood estimates by means of the following 
steps: 

1. Construct a set of first approximations to the r row components of the rs 
maximum likelihood divisors (a* + b } ) by means of the equation 

(9.1) a*(l) = nijn'pi, - 1/2. 

2. Compute successive approximations to the a, and b> by means of the equa¬ 
tions 

(9.2) bj(g) = [nj - 'Lm ij (\)a i {g)]/n'p . j , 

(9.3) a>i(g + 1) = [n<. - 2m tj (l)b j (g)]/np t ., 

where Wiy(l), the first approximation to m*,, is derived by means of equation 
(8.1). Just as in Deming’s iterative process, the expression in brackets is a 
series of products which can be subtracted in a single sequence of machine 
operations and the final division can be performed without having to record 
any of the intermediate results. 

3. Divide the sample frequencies by the maximum likelihood divisors to obtain 
the maximum likelihood estimates 

(9.4) m tj = riij/iai + b 3 ), 

where limiting values of a ,• and b 3 - are approximated as closely as desired by 
successive approximations in the preceding equations. 

Under unfavorable conditions, the iterative process of this section is not 
always the easiest way to obtain satisfactory estimates. For example, when 
samples are small and/or rows and columns are not approximately proportional, 
it is better to use the iterative method as originally presented by Stephan where 
sample frequencies can be used for first approximations to the dj and these may 
be replaced by successively better approximations. 

The point made in the final paragraph of Fisher’s well-known book [3] that 
“in practice one need seldom do more than solve, at least to a good approxima¬ 
tion, the equation of maximum likelihood,” is strongly supported by the develop¬ 
ments of this article. In addition, the proof that the method of least squares 
and the method of minimum chi-square always lead (by means of approxima¬ 
tions to ideal weights) to maximum likelihood statistics greatly facilitates the 
adaptation of techniques developed in connection with these hitherto competing 
methods. 


REFERENCES 

[1] W. Edwards Deming, Statistical Adjustment of Data, John Wiley & Sons, 1943. 

[2] W. Edwards Deming and Frederick F. Stephan, “On a least squares adjustment of 

a Bample frequency table when the expected marginal totals are known,” Annals 
of Math. Stat., Vol. 11 (1940), pp. 427-444. 

[31 R. A. Fisher, Statistical Methods for Research Workers , 6th ed., Oliver and Boyd, 1936, 
Ch. 9. 



254 


JOHN H. SMITH 


[4] Karl Pearson, “On the criterion that a given system of deviations from the probable 
in the case of a correlated system of variables is such that it can be reasonably 
supposed to have arisen from random sampling/’ Phil. Mag., Vol. 60 (1900), 
pp. 157-175. 

[51 Frederick F. Stephan, “An iterative method of adjusting sample frequency tables 
when expected marginal totals are known/’ Annals of Math. Stat., Vol. 13 (1942), 
pp. 166-178. 

[61 E. T. Whittaker, and G. Robinson, The Calculus of Observations, D. Van Nostrand 
Company, 1924, Ch. 9. 



A STATISTICAL PROBLEM CONNECTED WITH THE COUNTING OF 
RADIOACTIVE PARTICLES 

By Sten Malmquist 

Institute of Statistics , University of Upsala , Sweden 

1. Introduction. Our problem refers to random events forming a sequence 
in time or in space, e.g. particles emitted by a radioactive matter. By omitting 
certain elements of the given sequence, say /, we form another sequence, say g . 
The rule of omission involves an arbitrarily prescribed constant u. The rule 
to be followed in forming g is: 

Case I: Let a be an element in / and g. The next element to be included 
in g is then the first element in / which follows a after a distance greater than u. 

Case II: Let a be an element in / anct^. The next element to be included in 
g is then the first element in / which follows a at a distance greater than u from 
the preceding element in /, whether this belongs to g or not. 

When the events are represented by impulses emitted by a radioactive matter 
and feeding a recorder with a constant resolving time u , the new sequence con¬ 
sists of the counted impulses. The two cases correspond to the reaction of 
different types of recorders. The distinction between the two transformations 
has caused some confusion. It has, however, been clearly pointed out by 
Ruark and Brammer [5]. 

v. Bortkiewicz [2] seems to be the first who has considered problems related 
to the transformed sequence. Starting from investigations by Rutherford, 
Geiger, and others, concerning the number of recorded a-particles during a 
certain interval of time, say T> he observed that the distribution of this number 
was similar to that of Poisson but with a slightly smaller dispersion. This fact 
he supposed to be caused by a constant resolving time u of the recorder. By 
means of certain assumptions he tried to calculate the effect on the mean and 
the dispersion by the transformation in Case I, supposing the cumulative dis¬ 
tribution function F(t) for the distance between two consecutive elements in 
the sequence/is given by 

F(t) = 1 ~ <f“, 

where here and in what follows, t denotes a non-negative variable. 

Considering Case II with F(t) as above, Levert and Scheen [4] have recently 
worked out an expression for the distribution of the number of elements during 
T in the sequence g. 

Gnedenko [3] has considered the distribution of the number of lost elements 
in Case I with particular regard to the initial state of rest. 

Alaoglu and Smith [1] considered problems referring to successive trans¬ 
formations of a sequence. When, for example, a sequence of particles enters 
a tube-counter and amplifier, together acting with a resolving time u\ , and 

255 



256 


STEN MALMQUIST 


the impulses then are feeding a recorder with resolving time xh > u L , the se¬ 
quence of recorded impulses will be the result of two successive transformations. 
If we have a scaling circuit between the counter and the recorder, we have to 
make a transformation of another type between the two transformations in 
Case I and Case II. 

The present paper deals with the transformed sequence in Case I. The 
distribution function F{t) is supposed to be arbitrary. An advantage of this 
generalization is that the formulas derived could be used in treating problems 
referring to successive transformations. 

The author wishes to express his sincere gratitude to Professor Herman Wold 
for stimulating discussions and valuable advice. 


2. Derivation of distributions for case I. Suppose that the sequence / 
has F(t) for distribution function for the distance between two consecutive 
elements. F(t) is supposed to be independent of absolute time (space), and of 
the preceding distance between two elements. When not stated otherwise, 
we further suppose F( 0) = 0. 

Now let G(t) be the distribution function for the distance between two con¬ 
secutive elements in the transformed sequence g. Evidently G(t) also is inde¬ 
pendent of absolute time and of the preceding distance between two elements. 

We shall consider certain distribution functions connected with F(t). These 
functions will then be used in solving problems concerning the sequence g . 

Let F n (t) be the distribution function for the distance between the first and 
the last of n + 1 consecutive elements in the sequence /. Then F„(0 is given 
by the recursive system 


a) 


Fm+n(t) = f F m (t ~ *) <IF n (X )j 
Jo 

Flit) m Fit). 


As is easily seen, we have 


(m, n sJ 1) 


F m+ „«) < F m it)-F n it); 

and, for t = u, 

F H iu) —» 0, as » —» «; 

00 

52 F n (u) < oo, provided that Fi(0) <1. 

fimm 1 

Alternatively, F n (t) could be deduced by the use of characteristic functions. 
Still considering the sequence /, let ${t) be the distribution function for the 
distance d between an arbitrarily chosen point and the following element. 
Suppose that the arbitrary point is chosen so that the distance between the pre- 



COUNTING RADIOACTIVE PARTICLES 


257 


ceding and the following element is x. Under this condition we have, in usual 
symbols, 


P(d > t) - 


Hence, 


*(0 = 1 ~ l 


where H(t) is the distribution function for the distance x. 

To deduce H{t) we suppose that the distribution F(t) has a finite mean, 


= J tdF(t). 


By the definition of H(t), we then have 


H(x) = - 1 f tdF it). 

171 Ja 




The corresponding frequency function <p(t) is given by 

,«>. 

Consider n + 2 consecutive elements in /, say Oo, fli, • • * , a n+ i, where Oo 
is an element in the transformed sequence g. The probability P n that the 
next element in g following ao will be a n+ i is given by 


Pn = Fn(M) - F n+1 (u), 
P Q - 1 - F(ti). 


(n = 1, 2, • • • ), 


Now let P n(0 be the probability that the distance between ao and a n +i is 
smaller than or equal to t, when a 0 an o n+ i are two consecutive elements in the 
sequence g. Then 

p -< ! > - m r[«‘ - *> - f <“ - *>] 


(n = 1, 2 • • •)» 


D _ #"(0 - m 

Po(<) " ' i ->(«) • 


Let G*(t) be defined by 

<?*«) «= I) P. • P.(0 = P(0 “ P(«) 

A«0 


[F(< - x) - F(« - *)] dP.(x); t > u. 



258 


STEN MALM QmST 


When G*(t) is a distribution function, then G*(t) equals G(t ). 

For ti < h we obviously have G*(l\) < G*(t 2 ). 

For t = 00 t 

<?*(«) = 1 - F(u) + E f[l - F(w - x)]dF n (*) 

n-1 JO 

= 1 - F(u) + - X> n+1 ( W ) = 1. 

1 1 

Hence we take 

(4) G(t) = G*(t ); * > u 

Cr(<) = 0; t < u. 

When the corresponding frequency functions g(t) and f(t) exist, we get 

(5) g(l) = fit) + it, ( f(t - x)fn(x) dx; t> u. 

n—1 JQ 

Dealing with a sequence of elements we are often concerned with the number 
of occurrences during a certain time T. 

Let the mean number of occurrences during T be M{T). Supposing that 

the mean m = t dF(l) is finite and that F( 0) < 1, we have 

(6) M(T) = T/m. 

We define 

KM = 

A',«) = 

and denote the corresponding means by 

MM < M(t) < MM- 

Using (2), 

t JM + *[1 - /?(«)] _ 


m 

for t 

> 

— € 

0 

for t 

< 6 

Fit) 

for t 

> € 

F(t) 

for t 

< c 

Mi(T) and MM). 

As is easily seen, 


Mi ( t ) « 


J x dKi(x) jf x dKi(x) ’ 


MM = 


j [ xdK t (x) 


.[1-*[1 - Fit)]* + • • • + n-([l - F^fFi')*- 1 + ■■.] 


j[ xdK t (x) 



COUNTING RADIOACTIVE PARTICLES 


259 


Making N = T/t and summing, we obtain 


Mi(T) = 


f x dKi(x) m — f x dF(x) + eF(e) 
. •'o Jo 


Mt(T) 


_ T_ _ 

[ xdK 2 (x) 

Jo 


m — 


T 

f x dF(x) 
Jo 


By choosing e arbitrarily small, we get 

M(T) -> T/m. 

Let P(n f T) be the probability that we get n elements in f during a time T. 
Suppose that the first of these elements, ai , comes at T 0 + x , and the last, 
a n , at To + x + y. 

We then have 

(7) P(n, T ) - Jj <p(x) dx j[ r_I (1 — F(T — x — »)] dF n ^(y). 


In (4) and (7) we have equations for the transformation in Case I. Because 
of the general form of F(t ), the formulas also can be used when we are concerned 
with successive transformations. It can further be remarked that the trans¬ 
formation of a sequence of impulses by passing a scaling circuit is expressed by 
the system (1). 


3. Results for a particular form for F (/). The preceding formulas will 
now be used for a special distribution function F(t). Suppose that the fre¬ 
quency function /(/) = dF{t)/di is equal to the frequency function of the dis¬ 
tance between an arbitrary point and the following element. 

From (3) we get 

1 - F(t) 


F'(t) = 


m 


or, when F(0) = 0, 

( 8 ) 


( 9 ) 


F{t) = 1 - e~ at \ 

f(t) = ae ~ at , where l/a = m = tf(t) 


dt . 


By means of the theory of characteristic functions we have 

+ °° [n(%)Y e ~ U * dx> fiit) 35 f(t ); 


( 10 ) 

where 

(ID 


/n(<) “ 2 t L 


n(x) = a / 
Jo 


—at itx 


dt 


a — ix 



260 


STEN MALMQUIST 


i r +0 ° 


(o — ix) n 


Thus 

( 12 ) 

For » = 1, we get 

da s s* 


cfcr 


dx 


By differentiating (13) n — 1 times with respect to a we obtain 


(-<)■ 

Hence, from (12), 
(14) 


n—1 —at 

e 




(o — ix) n 




m = 


r'e“ a '. 


(n- 1)1' 

From (5) we obtain the frequency function for the transformed sequence g 


(15) 


g(D = +i: r 

«-l */0 


ae 


(n - 1)1 
G(«) = 0; t < u. 


t n ~ l dx 


t > u 


The mean m 0 is given by 

r 

m 0 = a 

j u 


te au e at dt = —h u. 


Remark: Suppose the constant u is allowed to vary independently of t and 
that the frequency function of u is y(u), we obtain 


(16) 


m g = f t dt f g(u , t)y(u) du = [ - y{u) du + [ uy(u ) du 

Jo Jo Jo a Jo 


= - + m(u). 
a 


Now let the sequence of elements, g , by means of (5) be transformed into a 
new sequence, h. When we are concerned with the counting of particles, 
emitted from a radioactive matter, let the sequence g consist of impulses from 
a counter-amplifier with resolving time u } feeding a recorder with resolving 
time Ui . Then the elements in h are the counted impulses, it being supposed 
that the tube-counter and the recorder reacts according to the assumptions. 

We suppose Ui > u. When u\ < u y the sequences g and h are identical. 

Let g n (t) denote the frequency function of the distance between the first and 
the last of n + 1 consecutive elements in g . We find, in the same way as 
used in obtaining (14), 


(17) 


g«(t) = 


(n - 1) 1 


e anu (t - nu) e 


»—1 —at 


t > nu. 



COUNTING RADIOACTIVE PARTICLES 


261 


Let h(t) be the frequency function for the distance between two consecutive 
elements in the sequence h. Let further N be the greatest integer smaller 
than or equal to u\/u. 

Using (4) and (5) we obtain 


h:(t) - oe au e- at E-Ant- n«)"r ; 
o ft! 

(18) MO = ae* u e~ at ££[*_(»+ 
o n\ 

M(0 = ae°“ e~ at E - (» + l)w]V nu , 

o n\ 


t > y>i + u; 
(N + 1 )u < t < ui + u; 
U\ < t < (N + 1 )u. 


The mean m* is found to be 

d9) .*=ri+«] r i +11 - (mi ~ r )V ^ (ul-nu, 1 • 

LG JL «•»! n t>! J 


We also have 


or 


f thi(t) dt < m h < f thi(t) 

Ju\ + u •'m 

'] 


dt 


a(«i— nu) 


G + Ml + M ][? n! (U, "”“ r 

< m h < [*- + Mil e°“ fE —;| (“i _ »«) 

l a J L o n - 


n —a(ui—nu) 
€ 


]• 


We now consider the number of occurrences during a time interval T. Using 
(6), (16), and (19) we immediately get the mean numbers of occurrences during T. 
By (3), we get for the sequence g 


( 20 ) 




au + 1 * 


t < u 


d au —at . 

6 6 • 


t > u. 


au -j- 1 

Inserting (20), (15) and (14) in (7) and evaluating the integrals, we finally get 


(21) P f (n, T) = I 


O n -1 — 2o n + O n +l J 

o n - 1 — 2a n + (n + 1) — 
aT 


aT 


a„-i - 2 


au + 


au + 1 } 

-J+G. + D-^; 


T 

n <- - 1 
u 

-1 < n < - 

u u 


T T 

- < n < ± + i # 
u u 



262 


STEN MALMQUIST 


where 

( 22 ) 


a n = 


-a(T-nu) V' (T — nu) v a v 


au + 1 

o-i = 0. 


E 

ttaO 


vl 


(n - »), 


(» = 0, 1, • • •), 


When u = 0, we obtain 

n mt r 

On = e“‘ T ]£ (n ~ v). 

v-0 VI 

For the sequence / we then get the Poisson distribution 
(23) P/(n, T) = e~ aT . 

The corresponding expression for the sequence h is much more complicated* 


4. A statistical experiment. The following statistical experiment will serve 
as an illustration of the scheme dealt with in this paper—the transformation of 
a sequence and the resulting formulas, especially (21). 

Groups of five figures, the last rounded up if necessary, have been extracted 
from tables of random sampling numbers (6). Let each group denote the first 
five digits for a decimal x , arbitrarily chosen between 0 and 1. The variable 
x is supposed to have the distribution function t for 0 < t < 1. We now define 
a new variable, y , given by 

(24) y = —A; log (1 - x), [or y = -k log*]. 

The variable y has the distribution function given by (8), viz. 

F(t) — l — e~ a \ where - = m = k\oge. 
a 

Transforming each group, or number x , according to (24), we get a sample of 
consecutive distances between elements in the sequence / considered in the 
previous sections. Choosing a constant u , we can construct the corresponding 
sequence g . Beginning with a point, arbitrarily chosen on the first distance, 
we can finally count the number of elements in successive intervals of the same 
length. 

Take k = 1, u = 0.2 and T = 1.5. We then have for the sequences/ and g: 

m/ = - = log e = 0.4343; m g = - + u = 0.6343; 

a a 

<r/ = - = 0.4343; <r„ = - = 0.4343; 

" a 

M,(T) = L = 2.365. 

Mg 


M,(T ) = — = 3.454. 

Tilf 



COUNTING RADIOACTIVE PARTICLES 


263 


The experiment yielded the following results: 

For the sequence f: For the sequence g: 

Number of elements 801. Number of elements 565. 

fhf = 0.450. m a = 0.648. 

In neither case is the deviation between the observed and theoretical means 
statistically significant. In fact we have: 

(OT/ - mjW_ 800 _ 10 . (m g - w,)a/ 554 _ Q g 
at ' <r„ 

which gives P = 0.3 and P = 0.4, respectively. 


TABLE I 

Nos. of intervals with n elements 



Sequence/ 

Sequence g 

n 

Observed 

Expected 
according 
to (23) 

Observed 

Expected 
according 
to (21) 

Expected 
according 
to (23) 

0 

6 

7.0 

5 

8.2 

23.7 

1 

33 

26.1 

53 

42.5 

54.8 

2 

48 

45.1 

82 

81.8 

63.3 

3 

55 

51.9 

69 

72.2 

48.8 

4 

36 

44.8 

23 

29.21 

28.1 

5 

32 

31.0 

6 

4.sl 

13.0 

6 

17 

17.8 

1 

0.2J 

5.01 

7- 

12 

14.7 



2.4/ 

2 

239 

239 

239 

238.9 

239 

Mean 

3.331 

3.454 

2.310 

2.36 

2.31 

X s 


4.825 


4.524 

36.7 

p 


0.68 


0.34 

<0.001 


The functions a n in (22) can be calculated by means of Pearson's tables of 
the incomplete y-function (7). In the notation of these tables we obtain 

e " x .l^ = / (v^ ;r_2 ) = 7(p,9) - 


Hence 


«» = —r; ^ f 1 - 9)1. 

au +1 nl au -f* l 



264 


STEN MALMQUIST 


where 

X = a(T - nu ); p = 5 q = n — 2. 

In the present case, however, we only need the numbers up to ri 7 . Accordingly, 
the a n have been calculated directly. 

The resulting theoretical and observed distributions for the number of ele¬ 
ments during T for the sequences / and g will be found in Table 1. For com¬ 
parison, a Poisson distribution, with the same mean as observed for the sequence 
0i is given. The result of a x test is also shown in Table I. Judged by the x 
test the distributions (23) and (21) agree fairly well with the observed distri¬ 
butions. As was to be expected, the Poisson distribution cannot be used for 
the sequence g. 

INFERENCES 

[1] L. Alaoglu and N. M. Smith, “Statistical theory of a scaling circuit,” Phys. Rev., 

Vol. 53 (1938), pp. 832 836. 

[2] L. v. Bortkiewicz, Die radioaktive Strahlung als Gegensland wahrscheinlichkeitstheo- 

retischer Untersuchungcn, Berlin, 1913. 

[3] B. V. Gnedenko, “On the theory of Geiger-Muller counters,” (in Russian), Jour, for 

Exp. and Theor. Phys., Vol. 11 (1941), pp. 101 106. 

[4] C. Levert and W. L. Screen, “Probability fluctuations of discharges in a Geiger- 

Muller counter produced by cosmic radiation,” Physica, Vol. 10 (1943), pp. 
225-238. 

[5] A. E. Ruahk and F. E. Buammer, “The efficiency of counters and counter circuits,” 

Phys. Rev., Vol. 52 (1937), pp. 322-324. 

[6] M. G. Kendall and B. Babington Smith, Tables of Random Sampling Numbers, Tracts 

for Computers XXIV, Cambridge, 1939. 

[7] Karl Pearson, Tables of the Incomplete T -function, Cambridge, 1922. 



THE PROBABILITY FUNCTION OF THE PRODUCT OF TWO NORMALLY 
DISTRIBUTED VARIABLES 1 

By Leo A. Aroian 
Hunter College 

1 . Introduction and summary. Lot x and y follow a normal bivariate prob¬ 
ability function with means X, Y , standard deviations <ri, <r 2 , respectively, r 
the coefficient of correlation, and pi = X/a iy p 2 — Y/<t 2 . Professor C. C. 
Craig [1] has found the probability function of z = xy jo \o 2 in closed form as 
the difference of two integrals. For purposes of numerical computation he has 
expanded this result in an infinite series involving powers of 2, pi , />2, and Bessel 
functions of a certain typo; in addition, he has determined the moments, semin- 
variants, and the moment generating function of z. However for pi and p2 
large, as Craig points out, the series expansion converges very slowly. Even 
for pi and p 2 as small as 2, the expansion is unwieldy. We shall show that as 
Pi and P2 —^> 00, the probability function of z approaches a normal curve and in 
case r = 0 the Type III function and the Gram-Charlier Type A series are excel¬ 
lent approximations to the z distribution in the proper region. Numerical in¬ 
tegration provides a substitute for the infinite series wherever the exact values of 
the probability function of z are needed. Some extensions of the main theorem 
are given in section 5 and a practical problem involving the probability function 
of z is solved. 


2. Theorems on approach to normality. The moment generating function 
of z, Mz(d ), is [1] 


( 2 . 1 ) 


(pi + P 2 — 2rpip2)0 2 -f 2pip 2 0 

M (ti\ . P 2[1 —(1 + r)0][l + (1 — r)0] 
V [1 - (1 + r)0][l + (1 - r)0] 


Let z, and <r, be the mean and the standard deviation of 2, and t„ = (z — 2)/<r,. 
Now 


(2.2) 2 = P1P2 4~ r, 0* — y/pi 4- p §+ 2rpip 2 -f 1 + r 2 . 

Using (2.2) we find in the usual w r ay the moment generating function of t M 


(2.3) M u = 


exp 


—2 rw + (pi 4- pi 4~ 2 rpip 2 )w 2 4- 4 r 2 w* — 2w*(r 2 — 1)(piP2 4- r ) 
2[1 — (1 4- r)^][l 4- (1 — r)w] 


V [1 — (1 4- r)w][l 4- (1 — r)w] 


where w = 9/a x . 


1 Presented to the American Mathematical Society, Oct. 28, 1944, New York City. 

265 



266 


LEO A. AROIAN 


Consider r ^ 0. Then in the limit as pi and p 2 —> 00 in any manner whatever, 
(2.4) lim M,,(6) = e*' 12 , 

and by the theorem of Curtiss [2] on moment generating functions we see in 
the limit as pi, pz —► «© the probability function of z approaches a normal curve 
with mean, I, and variance a\ , r ^ 0. 

In case — 1 -f * < r < 0,« > 0, some care is required wherever 

Vpi + p\ + 2pi p 2 r 

occurs. If one uses pi + p? ^ 2pip- 2 , the proof goes forward quite readily. 
Hence we have proved the theorem: 

Theorem (2.5). The distribution of z approaches normality with mean z, 
and variance a\ as pi and pz —* <x> in any manner whatever , — 1 + e < r i 1, 
€ > 0 . 

It is evident in Theorem (2.5) we may allow pi, P 2 —► — 00 without any other 
changes. Theorems (2.6) and (2.7) are proved in essentially the same way 
as (2.5). 

Theorem (2.6). The distribution of z approaches normality with mean z, 
and variance a] , if pi —► 00 , P 2 —* — 00 , — 1 ^ r < l — e, c > 0. 

Theorem (2.7). The distribution of z approaches normality if pi remains 
constant pz —> °°, — 1 + « < r ^ 1, e > 0; or if pi remains constant pz —* — «, 
-1 £ r < 1 - e, € > 0. 

Naturally in any of the theorems pi and pz may be interchanged. In practice 
Pi and p 2 are usually positive. The approach to normality is more rapid if 
both pi and pz have the same sign as r. 


3. Numerical values. In order to show how closely the Type III and the 
Gram-Charlier Type A series approximate the probability function of z, /(z), 
or more precisely/(z, pi, pz ,r), we use numerical integration where 


(3.1) 


/(z, Pi,to,r) = /,(*) - /,(*), 

/l(z) - wb L exp - 20^t>) { (x - p ' }t - 2r(x - Pl) (* - ") 


+ 


(;-»)}?■ 


and Iz(z) is the integral of the same function over (■—<», 0), [1]. Now 7i(z) 
may be written as 


(3.2) 

where 


<p(t) 


/i(z) * [ Ahhwm —, 


V%r’ 


_ X - Pi 

vT^T 2 * 

£(*a) = e l# f /s 


rtih. 



PROBABILITY FUNCTION OF A PRODUCT 


267 


We readily obtain I\(z) y/\ — r 2 by forming the product of ^(k), 0(fo), 
and 1 /x using numerical integration applying Weddle’s formula, the Gregory- 
Newton formula, or the simple rectangular formula depending on circumstances. 
The rectangular formula [3] is remarkably accurate when the function T = 
<p(t\)<p(k)P(h)/x in the interval 0 to oo or 0 to — oo is somewhat symmetrical. 
Appropriate tables for <p(h), v>(k) (see [4]), /3(/ 8 ) (see [5]) and 1 /x (see [ 6 ]) are 
readily available. In the important case of the independence of x and y, r = 0 
and (3.2) becomes 

(3.3) /,(«) = f vihMk) h = x- Pl , t, = p, - ? . 

*0 X X 


4. Approximations to f(z). When r = 0, the standard seminvariants £*, 
and £4 of 2 are 


(4.1) 

remembering 


. _ 6 pip 2 

6 lpi+ P i + i) m ’ 


t = ^1^(pi + P2) + 1} 

(pi + />? + D 2 


2 = PlP2 , O'* = V^pJ + p2 + 1 . 
In the Pearson system (see [7]) 5, the criterion, is 


(4.2) 


2£ 4 — 3fa 
6 + £4 


and for the probability function of z 

/ 4 m * _ 2(pi -f P 2 + l){2(pi 4- P 2 ) + 1) — 18pip£ 

0>i 4- P 2 4- l)[(pi 4* P 2 + l) a + 2 (p? + p?) + 1] 

and if pi = p 2 = p 

, _ 2(4p 2 + l)(2p 2 + 1) - 18p 4 


(4.4) 


(2p 2 + 1)[(2 P * + l ) 2 -f (4p 2 4- 1)1 * 


Now S = 0, £3 7 ^ 0, for the Type III function, and clearly lim 5 = 0. 

By use of (3.3) the accurate values of f(z) have been calculated for various com¬ 
binations of pi and p 2 and compared with the Type III approximation using z, 
£a. 

(4.5) Investigations so far completed show that for pi ^ 4 and P 2 4 simul¬ 
taneously, and | 5 | ^ .008, the Type III approximation will provide values 
of t M correct to three significant figures at least where 
ft r 00 

(4.6) / f(tg) = a, / f(t M ) = a, and .05 ^ a £ .005. 

These are the values of t, which would be needed in testing hypotheses. The 
exact values of /, (1) and for t { , 2) for various values of pi and P 2 less than 4 will be 



268 


LEO A. AROIAN 


determined it is hoped in the future and will be published along with the com¬ 
parisons of the Type III values of t z with the accurate values of t z in the im¬ 
portant borderline cases of = p 2 = 2, and pi = P 2 = 3. The values of /(z) 
for pi = pa = 2 and pi = p 2 = 4 have been calculated but these are being with¬ 
held for a more complete table. The table of values of z, <r*, £3 , £ 4 , and 5 
(Table II) ohows then that the Type III function is excellent along a band about 
pi = P2 , since £3 5^ 0 , and 8 is very small. 

Wc use the Gram-Charlier Type A series of three terms to approximate the 
probability function of z in l z units. 

(4.7) /(/.) ~ vKO - |f + |f 

in the usual notation. 


TABLE I 



f(t a ) Correct value 

Normal Curve 

Gram-Charlier 
Type A 

.9950372 

.2406367 

.2431716 

.2408235 

1.4925558 

.1275209 

.130970 

.127484 

1.9900744 

.0538213 

.0550708 

.053704 

2.4875930 

.0184606 

.0180791 

.0184500 

2.9851116 

0052477 

.0046338 

.0052944 

3.4820302 

.0012609 

.0009272 

.0012804 

3.9801488 

.0002611 

.0001449 

.000260 

4.4776674 

.0000467 

.0000177 

.0000425 

4.9751860 

.00000745 

.00000168 

.00000555 


(4.8) For | £ 3 1 < .6 and £4 < .4 simultaneously the Gram-Charlier Type A 
series is quite adequate for finding probability levels such as those of (4.6). 
These will in general give 3 significant figures for t^ l) or t?\ In the special case 
Pi = 0, P 2 = 10, the Gram-Charlier Type A series differs from f{t z ) very slightly 
in the range 1 ^ | U I < 00 (see Table 1 ). Naturally the Gram-Charlier will 
be used wherever Type III is not indicated, although there exist some over¬ 
lapping regions where either one may be used. It should be noticed that the 
approach of /(z) to normality is more rapid along a row than down a diagonal. 
In case either pi or P 2 is negative, we may make use of the equation 

(4.9) /(z, -pi, P 2 , r) = /(-z, pi, P 2 , —r). 

We note that when r = 0, /(z, pi, P 2 ) always possesses a discontinuity at z = 0, 
(see [ 1 ]). A table of z, <r, , £3 , £4 , and 5 is provided for values of pi and P 2 from 
0 to 10 inclusive. 



PROBABILITY FUNCTION OF A PRODUCT 


269 


TABLE II* 


\ 

\ pl 

Pa N. 

2 

4 

6 

8 

10 


0 

0 

0 

0 

0 


2.236068 

4.123106 

6.082762 

8.062258 

10.049876 

0 

0 

0 

0 

0 

0 


2.160 

.685121 

.319942 

.183195 

.118224 


.529 

.205 

.101 

.059 

.039 


4 

8 

12. 

16. 

20. 


3 

4.582576 

6.403124 

8.306624 

10.246951 

2 

.8 

.498784 

.274256 

.167493 

.111531 


1.259259 

.557823 

.289114 

.172653 

.113742 


.020 

.056 

.056 

.042 

.031 



16. 

24. 

32. 

40 



5.744563 

7.280110 

9. 

10.816654 

4 


.506408 

.373206 

.263374 

.189641 



.358127 

.224279 

.147234 

.102126 



-.0084 

.0049 

.014 

.016 




36. 

48. 

60 




8.544004 

10.049876 

11.704700 

G 



.346314 

.28373 

.224503 




.163258 

.118224 

.087272 




- .0054 

-.00083 

.0038 





64. 

80 





11.357817 

12.845233 

8 




.262088 

.226472 





.092663 

.072507 





-.0034 

-.0015 






100. 






14.177447 

10 





.210551 

.059553 






-.0023 


* The first value in o cell is 2, the second a 3 , the third , the fourth , the 
fifth S. 



270 


LEO A. AROIAN 


5. Some extensions. We may generalize our results to any case where x 
and y are distributed approximately in a normal distribution such as the dis¬ 
tribution of the product of two means, when the sizes of the samples Ni and N 2 
are large and consequently pi and p2 will be large. Another example occurs if 
x and y each follows a Bernoulloi probability function with parameters p\ and 
P 2 respectively where the number of trials in each case is large. We must warn 
the reader that the condition pi —► <», p^ —> <x> alone does not mean that the dis¬ 
tribution of z approaches normality. Both x and y must be distributed normally. 

The actual problem which gave rise to this investigation was the question 
of determining the sum of a great many variates [8]. Let T variates v x , v 2 , 

• • •, v T be given whose sum A = is desired. Clearly 

A = T? P , V P = 2 Vi/T. 

i-1 

Now let us estimate A by A = T a V» where T a is an estimate of T and V B is an 
estimate of V p . If a$^ is very small, pi = T/a$ will be large and p2 = V P /oy t 
= y/NV p /(Tp will be very large. Assuming T a is distributed normally and 
obviously V a is distributed normally for N large, we see by the theorems of this 
paper that A will be distributed normally. Confidence limits for A may be 
calculated in the usual fashion as A ± tct-, where y is determined by 

f (p(t)dt = a, 

with a generally chosen as .025 or less and 

<7- = \/Tl C~ -f F' ff* + (Tp . 

Stratification is also possible. It is interesting to note that many functions which 
occur in life insurance are products. Such applications will be treated fully 
elsewhere. Naturally the critical region whether both tails or one tail of the 
distribution should be used depends on the alternatives to the hypothesis being 
tested. 

Generalizations of the main theorem are possible for the probability function 
of z = XI<—i x * where x \, x 2 , • • •, x T follow a multivariate normal probability 
function. These will be investigated in a later paper. It may be noted that 
J. B. S. Haldane has investigated the distribution of a product along different 
lines [9]. 


REFERENCES 

[1] Cecil C. Cbaig,“Oii the frequency function of xy” Annals of Math. Stat ., Vol. 7 (1936), 

pp. 1-16. 

[2] J. H. Curtiss, “A note on the theory of moment generating ^^^ 0 ^/* Annals of 

Math. Stat., Vol. 13 (1942), pp. 430-434. 

[3] A. L. O'Toole, “On the degree of approximation of certain quadrature formulas,” 

Annals of Math. Stat., Vol. 4 (1933), pp. 143-153. 



PROBABILITY FUNCTION OF A PRODUCT 


271 


[4J Arnold N. Low an, Technical Director, Tables of Probability Functions , Vol. II. Na¬ 
tional Bureau of Standards, Washington, D. C. 

15] Arnold N. Lowan, Technical Director, Tables of the Exponential Function e *. Na¬ 
tional Bureau of Standards, Washington, D. C. 

[6] Arnold N. Lowan, Technical Director, Tables of Reciprocals of Integers from 100,000 

through 200,009. Columbia Univ. Press, New York. 

[7] Cecil C. Craig, “A new exposition and chart for the Pearson system of frequency 

curves,” Annals of Math. Sial., Vol. 7 (1936), pp. 16-28. 

[8] Leo A. Aroian, “Some methods for the evaluation of a sum,” Amer. Stat. Ass. Jour., 

Vol. 39 (1944), pp. 511-515. 

[9] J. B. 8. Haldane, “Moments of the distribution of powers and products of normal 

variates,” Biometrika, Vol. 32 (1942), pp. 226-242. 



NOTES 

This section is devoted to brief research and expository articles on methodology 
and other short items . 


A REMARK ON CHARACTERISTIC FUNCTIONS 

By A. Zygmund 
University of Pennsylvania 

1. Let F(x ), —oo <£<+«>, be a distribution function, and 

+00 


/ -+-<» 

e' ,x dF(x ) 

00 


its characteristic function. It is well known that the existence of ^>'(0) does 
not imply the existence of the absolute moment 

( 1 ) f +C °\x\dF(x). 

00 

A simple example is provided by the function 

'W-'&Sr.- 

where C is a positive constant. Since the series on the right differentiated term 
by term converges uniformly (see [1]), <p'(t) exists (and is continuous) for all 
values of t } and in particular at the point t = 0. Obviously <p(t) is the char¬ 
acteristic function of the masses C/2n\ogn concentrated at the points ±n 
for n ~ 2, 3, • • *. The constant C is such that the sum of all the masses is 1. 
The divergence of the series SI /n log n implies that in this particular case the 
moment (1) is infinite. 

In a recent paper (see [2], esp. p. 120, footnote), Fortet raises the problem of 
whether the existence of <p'( 0) implies the existence of the first algebraic moment 

(2) [ zdF(x) = lim f xdF(x). 

J—cC X-4+CC J—X 


The main purpose of this note is to show that this is so. We shall even prove 
a slightly more general result. 

A function yj/{t) defined in the neighborhood of a point to is said to be smooth 
at this point if 

jim Mfo + h) + \f/(to - h) - 2 = 0 

*-*+o h 

Clearly, if ^ has a one-sided derivative at the point U , the derivative on the 
other side also exists and has the same value. Thus the graph of \//(t) has no 
angular point for t — to , and this explains the terminology. If ^'(fo) exists and 
is finite, >p{t) is smooth for t = to . The converse is obviously false, since any 

272 



CHARACTERISTIC FUNCTIONS 


273 


function whose graph is symmetric with respect to t — to is smooth at that 
point. 

Theorem 1. If the characteristic function <p(t) is smooth at the point 0, then 
a necessary and sufficient condition for the existence of <p'( 0) is the existence of the 
moment (2). The value of (2) is —up' (0). 

In particular, the existence and finiteness of ^'(0) implies the existence of (2). 
That the converse is false, is obvious. For if oo , <h , (h , • • • are positive num¬ 
bers and a 0 + 2ai + 202 + • • • = 1, then \f/(t) = ao + 225° a n cos nt is the 
characteristic function of the distribution function F(x) corresponding to masses 
concentrated at the integer points ±n and having the values a n there. Owing 
to the symmetry of the masses, the number (2) exists, and is zero even if <p(t) 
is non-diffcrcntiable for t = 0 (we may e.g. take fo>r <p{t) the Weierstrass non- 
differentiable function C 2? a n cos b n t, where C is a suitable constant). ^ 

Proof. We may write 

<p(t) = f cos xt dG{x) + i [ sin xt dG(x) = \j/\(t) + i\fri(t) 

Jo Jo 

where 




274 


A. ZYGMUND 


Since fa(t) is even, the smoothness of <p(t), and so also of fa(t), at the point 


1 = 

0 implies that ^((0) exists and is zero. If /i —► +0, 


fa (h) — fa(0) 
h 

f*sin xh , TJ , v 

1 , k d,iM 

*1 Ih /•» 

= +/ =Ah + Bh, 

Jo Jl/h 



/ r 2Ih r i!h 

f*lh \ 

|& 

| < hr 1 / \dH\< hr 

'(/ dG + 

dG + / dG -{-•••) 


JlIK 

\Jl/h Ji/h 

• Ji/h ) 



= h 

~ 1 o(h + h/2 + h/4 + ---) = o(l), 

by i 

[3) and (5). Also 




MM 

A h - xdH — 

Jo 

f l,h /sin hx 

X V hx 

\ xdH = J 0(x i h , )xdG 

1 


r m 



= 

/ 0(x 2 h) dG = 

Jo 

o(l), 

by (3) and (4). Thus 




fa (h) — ^(0) 

fllh .llh 

= o(l) + / xdH = o(l) + / xdF, 


h 

Jo 

J—lfh 

and 

so 





^ o) - 0 ( 1 )+ i rr dF. 

h. J-i/h 


It follows that the existence of (2) is equivalent to the existence of the right- 
hand side derivative of <p(f) at the point t = 0, or, on account of smoothness, 
to the existence of <p'(0). Moreover, the value of (2) is — This com¬ 

pletes the proof of Theorem 1. 

2. Suppose that a function \l/(t) defined near the point k satisfies for h —»0 
a relation 

+ h) = ao + aji/1 !+••• + a k .,h k ~ l /{k - 1)! + [a* + <r(l)]/i*/H 

where ao, <*i, • ■ •, a k arc constants. Then a& is called the kth generalized de¬ 
rivative of yp at the point in. It will be denoted by fak)(to). The existence 
and finiteness of yp {k) (f 0 ) implies the existence of ^(*)(fo) and both numbers 
are equal. 

Another generalization of higher derivatives is based on the consideration of 
the symmetric differences 

A^(fo) = \p(to + h) — \p(k - h) y 
A&tto) = <K«o + 2 h) - 2i/'(< 0 ) + *(<o ~ 2A), 

A^«o) = t(to + 3 h) - 3^«o + h)+ 3^(/« - h) - 4>(k - 3 h). 



CHARACTERISTIC FUNCTIONS 


275 


If Atffr(to) / (2h) k tends to a limit as h —> +0, this limit is called the Arth sym¬ 
metric derivative of ^ at the point t 0 . We shall denote it by D k \f/(to). Clearly, 
Dirf/(to) exists and equals if the latter number exists. 

It is a simple matter to prove (see [3]) that if k is a positive even integer, 

and if the characteristic function (p(t ) has at t = 0 a finite symmetric derivative 

/ +« 

x k dF(x) exists, and its value is ( — 1)* /2 Dj#>(0). 

00 

x k dF(x) obviously implies (for k even) the 

00 

existence and continuity of <p {k \t) for all t , and in particular at the point t = 0. 

In order to obtain an extension of Theorem 1 to the case of derivatives of 
odd order, we have to generalize the notion of smoothness. We shall say that 
a function t(t) satisfies for t = t 0 condition S k , (k = 1, 2, • • •)> if 

a£ + V(*o) = o(h k ) as A —> +0. 

For k — 1, condition S k is identical with smoothness at to . Clearly, if to fc)(A>) 
exists, t satisfies condition S k at / 0 • 

Theorem 2. Suppose that k is a positive odd integer , and let <p(t) he the char¬ 
acteristic function of a distribution function F(x). If <p satisfies condition S k 
at the point 0, a necessary and sufficient condition for the existence of D k <p( 0) is 
the existence of the symmetric moment 

(6) f x k dF(x) = lim f x k dF(x) 

J- oo X -*+oo J—X 

whose value is then equal to i' k Dw(0). In particular , the existence of <p (k )( 0) 
implies that of (6). 

The proof of Theorem 2 is analogous to that of Theorem 1. Let G(x) and 
H(x) have the same meaning as before. Since k + 1 is even, condition S k 
at the point / = 0 gives 

Ajr%(0) = r ( e ixh - c~ lrh ) k+l dF(x) = 2* +1 (—i) <fc+1)/8 (sin xh) k+1 dF(x) 
J- 00 *-80 

= 2* +1 ( —1) ( * +1,/J j\<£nxh) k+l dG(x) = o(h k ), 

so that 


(7) 


r llh 

/ (sin xh) k+1 dG(x) = o(h k ) 

Jo 

r llh 

J x k+1 dG(x) = o(h~ l ) 

J fl/h 

' dG(x) - o(h k ). 

1/8 h 


( 8 ) 



276 


A. ZYGMUND 


On the other hand, 

-k&v®) _ £ + “ (si^y x > dF ( x ) = jf ** 


(2fc)‘ 


J -l/A /»« 

+ / = il* + B*, 

0 Jl/A 


say. Here 


r -2/A /.4/A “1 

I B h | < hr k / dO(x) = >r* / + / + ■ ■ • 

Jl/A J 2/A J 

= /r*£o(/i fc ) + + • • • J = o( i), 

by (8). Since 

&)' = {1 + 0(it 2 )}* = (1 + 0(u)\ k = 1 + 0(u) 
for small u , we immediately obtain 

-l/A -l/A 

A h - / afdfffc) = / 0(hx k+1 ) dG(x) = o(l), 

Jo Jo 


by (7). Collecting the results, we see that 


_» aW(0) _ f Vh 
1 $ (2/t)* I •/« 


x k dll(x) = 


—t aU(O) 
* (2fc)» 


-l/A 

/ x k dF(x) = o(l), 
J— l/A 


which completes the proof of Theorem 2. 

One more remark. By Theorem 2, the existence of the first moment is equiv¬ 
alent to the existence of the first symmetric derivative 


Ad<p( 0) = linu_> 0 [*>W - <p(-h)]/2h. 

In Theorem 1 we have a corresponding result for ordinary first derivative 
*>'(0) = linu-o [<p(h) - <p(Q)]/h. 


There is no discrepancy here since at every point where <p is smooth the two no¬ 
tions of derivative are equivalent. 


REFERENCES 

[1] A. Zygmund, Trigonometrical series, Warszawa-Lw6w, 1935, p. 108. 

(21 R. Fortet, “Calcul des'moments d’une fonction de repartition & partir de sa caracter- 
istique,” Bull, des Sci. Math., Vol. 68 (1944), pp. 117-131. 

[3] Harald Cramer, Mathematical Methods of Statistics , Princeton Univ. Press, 1946, p. 90. 



VARIANCE OP SEQUENTIAL ESTIMATES 


277 


A LOWER BOUND FOR THE VARIANCE OF SOME UNBIASED 
SEQUENTIAL ESTIMATES 

By D. Blackwell and M. A. Girshick 
Howard University and Bureau of the Census 

Consider a sequence of independent chance variables X \, x 2 ,* • - with identical 
distributions determined by an unknown parameter 0. We assume that E x t = 6 
and that W k = x\ + * * * + Xk is a sufficient statistic for estimating 6 from 
xi , • • • , Xk . A sequential sampling procedure is defined by a sequence of 
mutually exclusive events S k such that S k depends only on x x , • • • , x k and 
2 P(S k ) — 1. Define W = W k and n = k when S k occurs. In a previous paper 
by one of the authors [1] it was shown that if S k = W*C(/Si + • * * + 

(where C(A) denotes the event that A does not occur), the function V(W, n) — 
E(x 1 1 Wj n) is an unbiased estimate of 6, and <j 2 ( V) < a 2 (xi). It is the purpose 
of this note to obtain a lower bound for < r 2 ( V ) . Our result is: 

Theorem i. < r \ V ) > ~ . 

E(n) 

We remark that the lower bound is actually attained in the classical case of 
samples of constant size AT. For in this case, (see [1]), V = E(x 1 1 W N ) = Ws/N. 
In fact we shall show that in a sense this is the only case in which the lower bound 
is attained. 

The proof of Theorem I depends on certain properties of sums of independent 
chance variables. These, formulated more generally than is required for the 
proof of Theorem I, are given in 

Theorem n. Let Xi , ar 2 , • • • be independent chance variables with identical 
distributions , having mean 0 and variance < r 2 (x i). Let furthermore { S k } be any 
sequential test for which E{n) is finite. Let W = X\ + • • • + x k when n — k. 
Then 

(a) a\W - 0n) < a 2 (xi) E(n). 

(b) If <r 2 (n) is finite, the equality sign holds in (a). 

(c) E[xi{W - On)] = Axi). 

Proof of (a). Write y t - = x { — 0 , and define Y = y\ + * * * + Vk when 
n = k. By definition, 

(l) a 2 (W — 0n) = X] f (yi ‘ * ~h yk ) 2 dP . 

*-i Js k 

To prove (a), we must verify that the series on the right of expression (1) con¬ 
verges and has sum <<r 2 {x\)E{n). Now 

Z f (Vi + ••• + VkYdP 
fc-i Jst 

< ^ f (yi + • • • + yk) 2 dP + f (yi + • • • + ys) 2 dP 
tZ\ J Sk 

= £ f yldP + 2 ]C / Vk(yi + •••.+ yk- 1 ) dP. 

JU-l Jn^k fc-2 Jn^ik 


( 2 ) 



278 


D. BLACKWELL AND M. A. GIRSHICK 


Since the event {n > k } is independent of y k , each term in the second sum 
vanishes and the first sum becomes 

if yl dP = <t 2 (xi) i P{n>k\ 

fc-1 

(3) = = 1) + 2 P{n = 2| + • • • NP[n = iVJ 

+ NP{n > N\] < a*{xi)E{n). 

This establishes Theorem 11(a). 

Proof of Theorem n(b). Write *»• = | y, | and let Z = zi + •••+?* when 
n = k. From (a) it follows that a[{Z — nEizi)] is finite. If in addition, 
c 2 (n) < oo then E(Z 2 ) < oo. Thus the series 

(4) Zf (ti + ••• + z k ?dP = E f z<z,dP 

kmm 1 JSjfc l£«,}£fc<00 •'S* 

converges, so that the series 

(5) E f ViVidP 

l*S.i,j.£k<ao Js k 

converges absolutely. The terms of the latter series may be arranged to yield 
(A): if (*+•••+ ytf dP = S (W - On) 

*-1 J «ib 

or to yield 

B: E ( yldP + 2i f y k ( Vl + • ■ • + y*_i) rfP = a\ Xl )E(n). 
This proves Theorem 11(b). 

Proof of Theorem ii(c). It follows from Theorem 11(a) that Exi(W — On) 
is finite. If we show that 

(6) E(W — On | Xi) = xi — 0, i.e. E(Y | y Y ) ~ y x , it will follow [1] that 

(7) E[xi(W - 6n)\ = E[xd.xi - 6)] = c\xy). 

To verify (6), it is sufficient to show that if f(x 0 is the characteristic function 
of an event depending only on Xi (i.e. f(x i) = 1 when the event occurs, f(x i) = 0 
otherwise) 

(8) E(Jy 0 = E(fY). 

Write <£i = 0, fc = /• (y 2 + • • • + 2/f), i > 2. 

Then it easily verified that 

(9) E(<f>j | Xi , • • • , Xi) = for y > i 

i 

Efc < | yk | 

kmm 1 


ao) 

(ii) 


£?«..) = 0. 



VARIANCE OP SEQUENTIAL ESTIMATES 


279 


Hence it follows [2] that E<t> = 0 where 4> = <t>i when n = i. In our case <j> = 
fY ~ fy i > and E<t> — 0 yields (6). This completes the proof of Theorem II. 

Proof op Theorem i. In [1] it is proved that E(x,(W - On)) = E[V(W — 6n)\. 
Hence employing Theorem II we get 

(12) a\x,) = E[V{W - On)] = a(V) a (W - 8n)p 

where p, (0 < p < 1), is the coefficient of correlation between V and W — Bn. 
Substituting for <r(W — Bn) we get 

^ <r 2 (xi) < <r(V)<r(x i ) y/E(n) p 

< <r(V)<r(xi) VWrCj- 


Solving for <r(V) we finally obtain 


(14) 


AV) > 


> A*i) 

~ E{n) 


which proves Theorem I. 1 

If <r 2 (n) is finite, the equality sign in (14) will hold if and only if p — 1. We 
shall now prove the following. 

Theorem m. Let N be the minimum value, of n for which P(n = N) ^ 0. 
Then, a necessary and sufficient condition that p = 1 is that P(n = N) = 1. 

Proof. The sufficiency of this condition follows from the fact that if 
P(n = N) = 1, V = W/N. To prove the necessity of this condition, we 
observe that if p = 1, V is a linear function of Tf r — nB. That is, 


(15) 


V = a(W - nB) + p. 


Now, since EV = 6 and E(W — nB) = 0, it follows that p — 6. Also, since 
by hypothesis a 2 (V) = <r 2 (xi)/E{n) and <r 2 (W — nB) = <r 2 {x\)E{n ), it follows 
that a = 1/E(w). Hence the estimate V is given by 


(16) 


V = 


W - nB 
E(n) 


+ e. 


1 Under certain regularity conditions Cramer has obtained the inequality 

aHx) * i/E (nsr)' 

where f * f(x, 0) is the density function of x ([3], p. 475). Thus with the same regularity 
conditions, our inequality yields 

„>(V) £ i/JE(n)E^-^)\ 

which is a special case of the results presented by J. Wolfowitz in this issue of the Annals. 



280 


JOHN E. WALSH 


Let N be defined as above. We note that N < °o since by hypothesis E(n) < <*>. 
Let V N be the estimate of 0 when the sequential test terminates with n = N. 
Then V N = W/N. Substituting this value in (16) we get 



We exclude the trivial case where W = NO. Then (16) yields E(n) = N. 
That is P(n = N) = 1. This proves the theorem. 

We remark that N may be a function of 0 but for a fixed 0, n = N is fixed 
when p = 1. 


REFERENCES 

(1J D. Blackwell, “Conditional expectation and unbiased sequential estimation.” Sub¬ 
mitted to Annals of Math. Stat. 

[21 D. Blackwell and M. A. Gikshick, “On sums of sequences of independent chance 
vectors, with applications to the random walk in k dimensions,’ 1 Annals of 
Math. Stat., Vol. 17 (1946). 

[3] Harald Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, 1946. 


AN EXTENSION TO TWO POPULATIONS OF AN ANALOGUE OF 
STUDENT’S *-TEST USING THE SAMPLE RANGE 

By John E. Walsh 
Princeton University 

1. Summary. The modified *-test considered by Daly 1 (see [1]) is used to 
develop one-sided significance tests to decide whether the mean of a new normal 
population exceeds the mean of an old normal population having the same 
variance. Significance tests are also developed to decide whether the mean of 
the new population is less than the mean of the old population. These tests 
require very little computation for their application and are approximately as 
powerful as the most powerful tests of these hypotheses. 

2. Introduction. Let n , • • • , r„ , {n < 10), be independently distributed 
according to a normal distribution with zero mean and unit variance. Let r (w) 
denote the uth largest of the r's. Then Daly has shown how to determine 
numbers g a such that 

Pr[f/(T(n) - r w ) > g*] = a 

( 1 ) 

• Pr[f/(j(„) - r<i>) < = a. 

This note will use these relations to develop easily applied significance tests to 
decide whether the mean v of a new normal population exceeds the mean m of 


1 This problem is also considered by Lord in [2]. This note was in proof when [2] appeared. 



ANALOGUE OF /-TEST 


281 


an old normal population with the same variance. Significance tests are also 
developed to test v < y. The simplest case considered is that of testing a new 
sample value x on the basis of n past sample values y l , • • • , y n . Then the 
significance test at significance level a to decide whether v exceeds y consists in 
accepting v > yif • 

X > V + g« Vn + [y w - 1 /( 1 )], 

where y( U ) is the uth largest of y x , • • • , y n . 

The significance test of v < y consists in accepting v < y if 

x <y - g« Vn + 1 \y M - y m ]. 

These tests are generalized to the case in which j" is the mean of a sample of 
size r from the new population, each of yi , • • • , y n is the mean of a sample of 
size s from the old population, and z is the mean of a sample of size / from the 
old population. Then the tests at significance level a take the form 

Accept v > y if x > (1 - Ci)y + C x z + g a \y{n) — ya>J; 

( 2 ) 

Accept v < y if x < (1 - Ci)y + C x z - g a [V{n) — yen], 

where C i is a given constant which is selected by the person applying the test. 
The introduction of the terms z and C x allows less reliable past information to 
be utilized by lumping it together in the z term and using the constant Ci to 
weight this information according to its relative importance with respect to 
the y f s. 

The power of test (2) is compared with that of the corresponding Student /-test 
for the case C\ = 0 and n < 10. In this comparison the quantities x, y x , • • • , y n 
are considered to be the given sample values which are used for the test, that is, 
the quantities from which the means x, yi , • • • , y n were formed are not given. 
It is found that the power of the Student /-test is only slightly greater than that 
of the corresponding test (2). For the cases considered, however, it is well 
known that the most powerful test of v > n using the quantities x, yi , • • • , y n 
is the appropriate Student /-test. Similarly for testing v < y. Thus the tests 
(2) considered are approximately as powerful as the most powerful tests of 
v > m and v < n which use x, y \, • • • , y n • 

Examination of (2) shows that the amount of computation required for the 
application of one of these tests is small. Consequently the tests (2) have the 
desirable properties of being easily computed and nearly as powerful as any 
tests which could be used for the given hypotheses. This suggests their use in 
repetitive testing procedures which are concerned with the testing of the mean 
of a new sample on the basis of the means of previous samples. 

3. Statement of tests. In this section three significance tests of increasing 
generality are stated. It is to be observed that each test is a particular example 
of the test following it so that tests {A) and ( B ) are special cases of test (C). 



282 


JOHN E. WALSH 


The reason for stating tests (A) and (B) is that these tests have a much simpler 
appearance and will cover most cases of practical application. 

(.4). Let each of x, yi , • • • , y n represent the mean of a sample of size r; let 
the values of the sample whose mean is x have the distribution N(v , a 2 ) and the 
values of the samples whose means are y \, * • • , y n have distribution N(p, <r 2 ), 
where the notation iV({, o 2 ) denotes the normal distribution with mean { and 
variance o 2 . Then the significance test of v > p at significance level a is 

Accept v > p if i > y + g. ^/ n £ - [y<„) - 
The significance test to decide whether v < n is 

| j 

fa<»> “ 2/d)]- 


(B). Let x equal the mean of r sample values from N(v, a 2 ) and each of 
yi , • • • , y n equal the mean of s sample values from N(p , a 2 ). The significance 
test for v > ft at significance level a is 


Accept v > p if x > y + g t 




The test of v < p is given by 


Accept v < p if x < y 


- + - \.V(n) — y<n]. 
r s 


(C). Let x equal the mean of r sample values from N(v , <r 2 ), each of y x , • • • , y n 
equal the mean of a sample of size s from N(p, <r 2 ), z equal the mean of a sample 
of size t from N(p , <r 2 ), and Ci be a given constant value. Then the significance 
test of v > p at significance level a is 
Accept v > pif 


x > (1 - Ci)y + Ciz + [y^n) - yu)]g<* 


(l Cl 

V + t 


(1 - C l) 2 X 

G + H/ 


The significance test to decide whether v < p is 
Accept v < pif 


1 . 

r + t j 


X < (1 - Ci)g + Cl 2 - - y m )g a ■ /(- + ^±)/ n + —y 

‘'l '(hM) 

Values of g a for a = .05 are given in Table I. These values were listed by 
Daly in [1].* 


* Values of Q a for a « .05, .025, .01, .005, .001. and .0005 are listed in Table 9 of [2] for 
sample sizes from 2 to 20. 



ANALOGUE OF t-TEST 


288 


4. Derivation of tests. As tests (A) and ( B ) are particular cases of test 
(O) it is sufficient to derive test ( C). 

TABLE I 


Estimated Values of g. Qb 


n 

£.06 

3 

.882 

4 

.526 

5 

.385 

6 

.309 

7 

.260 

8 

.227 

9 

.202 

10 

.183 


I jet the quantities x’, y [, • • ■ , y » , z' be defined by 
, {x — v)y/r j (y, — m) V« 

X f = ^- , y x = - - , 

« -1, • 

••>«), 

a a 

m) 

O- 

. 



Then x', y[ , • • • , y» , s' are independently distributed according to N(0, 1). 
Define 


r u = g (^Kiy'u ~ y'* + + K 2 Cz , (w — 1, • • • , n). 

It is easily verified that 

iS(r.) = 0, E(rl) = £ [K{ + (1 + C*)Kl - 2 K x + n] 

E(r u r v ) = “,[(! + C 2 )Kl - 2K t + n], ' (w ^ w). 


Thus, if Ki and Iu satisfy the equations 

<3) (/; + VD K ‘ + K '-”'° 

(1 + C'‘)K\ - 2 Ki + 11 - 0, 

the r„ will be independent of y. when y = v. Also they will be independently 
distributed according to N{ 0, 1). 



284 


JOHN E. WALSH 


Rewriting the r u in terms of x, yi , • • • , y n , 2 one obtains 

(4) r u = £a'i 2 /„ — 2/< + K* ^\ x + \/\ z + K * \ O 4 V )J- 

Using (3) the mean of the r u is found to be 
a K*\/r\ 


Kiv 


' - (1 + c y + a |/L - (v - m)]- 


Let r (tt) denote the ?ith largest of n , • • • , r„ . Then from (1) 
a = Pr[f/ (r (n) - r m ) > g J = (* ~ (l + c V 

+ c 2 - (v - fl)j ! (j/(n) - .v<h) > 0.J 

It is easily proved from (3) that 

^ , /1+ <?(_', (Vr + CVW\ 

K,Vr - ± y ^ + a(1 + C ^ )* 

Choosing the positive sign, putting C = — Ci,and lettings = p one obtains 


Pr 


x > (1 — Ci)y 4- Ci z 


4 - ly(n) — ya)\g< 


/TOT^L^yi-* 


verifying the first part of test (C). The second part of test ( C ) is verified by 

K 1 


choosing the negative sign for j£^y r 
the second part of (1)). 


•- (or by repeating the above argument using 


6. Power comparison with t-test. Let x, yi , • • • , y n satisfy the conditions 
of test (B) in section 3. Then Student’s / using x, y x , • • • , y n is given by 

t = [- — -£ ~ ~ gU . / n - 1 

v s C 1 + i) 

The Student /-test based‘on this value of t furnishes the most powerful test of 
v > /* (and v < n) using x, 2 / 1 , • • • , 2 / 7 . • The purpose of this section is to show 
that test ( B ) has approximately the same power as this Student /-test for n < 10. 

Daly has shown (see [ 1 ]) that if n , • • • , r„ are independently distributed 
according to #({, <r 2 ), then the test based on 

(f - f)/(r (H ) - r ( d) 



NORM OP A MATRIX 


285 


has approximately the same power for testing £ > 0 (and £ < 0) as the corre¬ 
sponding Student 2-test based on 

(5) t = ft* ~ p Vn(n - 1) 

\/t. (r* ~ ?)* 

for n < 10. 

Using the notation of section 4 let 

= ^K\Vu - 22 Vi + Ki , (m = 1 , • • •, n), 

K-i 

where zr > 0. Then from consideration of (4) with C = 0 it is seen that the r« 

XV.2 

are independently distributed according to N(£, <r 2 ), where £ equals a positive 
constant times (v — n). Following the derivations in section 4 with C = 0, 
it is seen that the test of £ > 0 with this particular choice of the r u is identical 
with the test of v t > n given in (B) of section 3. Similarly the test of £ < 0 is 
identical with the test (B) of v < p. Thus the test (B) has approximately the 
same power for testing v > /z (and v < /x) as the Student 2-test based on the value 
of 2 given in (5) if n > 10. Replacing the r u in (5) by their values in terms of 
x, yi , • • • , t/ n , n, r, and a, it is found that (5) becomes 

n — 1 
\r ns/ 

This proves that test ( B ) is approximately as powerful for testing v > n and 
v < s the most powerful test based on the quantities x, yi , • • • , y n if n < 10. 
As test (A) is a particular case of test (R), these results also apply to test (A). 

REFERENCES 

[1] J. F. Daly, “On the use of the sample range in an a analogue of Student’s 2-test”. 

Annals of Math. Stat , Vol. 17 (1946), pp. 71-74. 

[2] E. Lord, “The use of range in place of standard deviation in 2-test,’’ Biometrika , Vol. 

34 (1947), pp. 41-67. 


2 = 


[X—Jj—Jv — m )1 


fa - 


yf 


ON THE NORM OF A MATRIX 

By Albert H. Bowker 
University of North Carolina 

In studying the convergence of iterative procedures in matrix computation 
and in setting limits of error after a finite number of steps, Hotelling [1] used 
the square root of the sum of squares of the elements of a matrix as its norm. A 
wide class of functions exists which may be employed as norms in matrix calcula¬ 
tion and substituted directly in the expressions derived by Hotelling. The 



286 


ALBERT H. BOWKER 


purpose of this note is to make a few general remarks about this class of functions 
and to propose a new norm which appears to have some value in computation. 

A function 0 (A) of the elements of a real matrix A may be termed a legitimate 
norm if it has the following four properties: 

( 1 ) 0 (cA) = | c | 0 (A), c a scalar; 

(2) 0 (A -f B) ^ 0 (A) -f 0(72), if A B is defined; 

(3) <f>(AB) ^ <j>(A)(f>(B ), if AB is defined; 

(4) 0(0 = l, where e tJ is a fundamental unit matrix 

whose elements are all zero except the one in the ?th row and , 7 th column, whose 
value is unity. These four conditions are identical with the first four axioms 
of Rella [2], who has shown them to be independent. Properties ( 1 ), ( 2 ), and 

( 3 ) are used directly in investigations of convergence and error, but the im¬ 
portance of property (4) is indicated by some of its immediate consequences. 
Clearly c x Ac 3 — a tJ , where e t is a fundamental unit vector. From (3) and (4) 
it follow s that | a t} | ^ 0 (A) for all ? and j and we have that 

(5) max (u) | a.j | ^ 0 (A). 

Thus 0 (A) has the useful property that the norm of a matrix of errors exceeds 
or equals the maximum possible error. Since 0 (A W ) ^ <t> m (A ), it folloxvs from 

(5) that the elements of A m will tend to zero as m increases if 0 (A) < 1, a result 
which is useful in establishing convergence. Also 0 (A) ^ 0 . 

One further consequence of ( 1 ) to (4) is of interest. Suppose A is a square 
matrix and let X be any of its roots. Then there exists a non-null vector x 
such that A x — Xx. Now <f>(\x) = X 0 (r) ^ 0 (A) 0 (r) and we have 

( 6 ) X ^ 0 (A). 

Thus, every legitimate norm is an upper bound to the characteristic roots. 
Clearly many functions exist which satisfy ( 1 ) to (4). The norm used by 

Hotelling is N(A) = . A new' norm which may have some value is 

obtained as follows: 

(7) R(A) = ma x {t) R % (A) 
where 

R t (A) = £ | a„ | . 

3 

Clearly R(cA) = | c | R(A). To show that R satisfies (2), consider 

R,(A + B) = £ | o„ + b„ | £ | rtl) | + £ | b„ | g R(A) + R(B). 

3 3 3 

Since the above inequality holds for all i, 

R(A + B) £ R(A) + R(B). 



NORM OF A MATRIX 


287 


Now AB *= II S Qiabaj II 

a 

and 

Ri{AB) -El E <**&„,• I ^ E E I I • I K, I 

7 ct i a 

g E I o.-« | R.(B) g R(B)R(A). 

a 

Hence R(AB) £ R(A)R(B). Clearly P(e,-,-) = 1. Similarly it may be shown 

that C(A) = max(;) | o<y | also satisfies the conditions of a norm. 

* 

Since the convergence of an iterative procedure is often proved by the norm 
being less than one, since the norm appears in the upper bound for the error 
after a finite number of iterations, and since the norm of a matrix of errors is 
taken to indicate the magnitude of the errors, a reasonable method of choosing 
among several available legitimate norms is to select the smallest. It is natural 
to inquire whether an optimum norm in this sense exists; that is, is there a 
function 0*(A) such that 0*(A) possesses properties (1) through (4) and such 
that 0*(A) ^ 0(A) for all other 0(A) satisfying these conditions. Assume such 
a 0*(A) does exist. Clearly 0*(A) = 0*(A'), as, if either exceeded the other, 
the smaller could be taken as 0*(A). Let A 2 .be the largest root of A A'. Then 
by (6) 

A 2 ^ 0*(AA') ^ 0* 2 (A) and A ^ 0*(A). 

But Rella [2] has shown that A possesses (1) to (4). Thus 

0*(A) = A. 

But, for a row vector, C y (A) ^ A. Consequently, no minimal norm exists. It is 
interesting to note that a worst norm does exist, namely P(A) = | a,-; | . 

t,7 

Since A = a t; -, 0(A) ^ P(A). Clearly P(A) satisfies (1) to (4) and hence 

M 

is the worst possible legitimate norm. 

In practical computation, the choice so far is between N(A ) and R(A) (or 
0(A)). No general inequalities exist and it would probably be advisable to 
compute both. R(A) may be less than N(A) and indicate convergence when 
N(A) fails to do so. Often R(A) may be computed visually and convergence 
proved without computing the sum of squares of the elements. 

The functions N(A ) and R(A) may also be useful in finding a simple first 
approximation to A -1 . A sufficient condition that Hotelling^ iterative method 
for finding the inverse of a matrix A will converge is that the roots of 
D = 1 — A Co be less than one in absolute value where C Q is a first approximation 
to A -1 . If the iterative procedure is to be carried out by a fully automatic 
computing machine such as the one described by Alt [3] it may be advisable to 
start with a rather poor first approximation which is easy to construct. If A 
has positive roots and if M is any upper bound to these roots and if C 0 is a matrix 
with diagonal elements equal to 1/M and zeros elsewhere, the iterative procedure 
will converge but the norm of D will not necessarily be less than one. From 
(6), any legitimate norm may be taken as M. 



288 


M. FH^CHET 


Finally, it is interesting to point out the relation of this note to some work on 
the problem of finding upper bounds to the roots. In fact, the inequalities 
X ^ N(A) and X R(A), which are consequences of (6), are Theorem 2 of 
Farnell [4] and Theorem 3 of Barankin [5] respectively. 

REFERENCES 

[1] Harold Hotelling, “Some new methods in matrix calculation,” Annals of Math. Stat., 

Vol. 14 (1943), pp. 1-34. 

[2] T. Rella, “Uber den absoluten Betrag von Matrizen,” International Congress of Math¬ 

ematicians at Oslo (1936). 

[3] Franz L. Alt, “Multiplication of matrices,” Math. Tables and Aids to Compution , Vol. 2 

(1946), pp. 12-13. 

[4] A. B. Farnell, “Limits for the characteristic roots of a matrix,” Bull. Amer. Math. Soc. t 

Vol. 50 (1944), pp. 789-794. 

f5] Edward W. Barankin, “Bounds for the characteristic roots of a matrix,” Bull. Amer. 
Math. Soc.y Vol. 51 (1945), pp. 767-770. 


DEFINITION OF THE PROBABLE DEVIATION 

By M. Fr£chet 

Faculty of Science , University of Paris 

The probable deviation has recently been defined by E. J. Gumbel [1], [2] 
as the smallest of the intervals corresponding to the probability It so hap¬ 
pened that the author was led to an equivalent definition starting from a general 
idea which may be applied to absolutely general cases and which, for this reason, 
might be of interest. 

In recent years, the author has been occupied with a study of random ele¬ 
ments of any nature (curves, surfaces, functions, qualitative elements), a study 
whose future seems promising, [3]. I gave a definition of the mean of such an 
element expressed by an abstract integral which, however, is only defined if the 
random element is situated in a metric vectorial (Wiener-Banach) space. 1 But 2 
a still more general definition is valid if the random element is placed in any 
metric space. It consists of taking, as mean position of the random element X , 
a fixed (non-statistical) element b = X such that the function of a which rep¬ 
resents the mean M(X, a) 2 of the squared distance of X to the fixed element o, 
is minimum for a = 6. (In the case where X and a are numbers, and where 
M(X) 2 is finite, we know that this minimum is reached and that there is one, 
and only one, determination 6 of a). This definition has the advantage of also 
defining the equiprobable position of X . This is a fixed element c = 51 such 
that M(X , o) is minimujn for c = a. (If X and a are numbers, we know that 
this minimum is still reached, but may be so reached by several values of X). 

Since reading Gumbel’s paper, a still more general definition suggested itself. 


1 For the definition of metric vectorial spaces see [4]. 
* See Note 2, p. 503 of [4]. 



PROBABLE DEVIATION 


289 


The expressions M(X f a) and y/M ( X, a) 2 themselves may be considered as 
distances, but as distances of two random elements taken together . To each 
of these distances corresponds as minimum, when a varies, a different “typical” 
function ]£ or X • • •. Thus, without supposing anything about the space 
into which the different trials place X , we assume that we have defined a “de¬ 
viation” of two random elements X , Y taken together. We represent this 
function of two random variables by (([X], [7])), a notation which differs from 
the representation of the distance (X, Y) of the two positions X and Y with 
respect to a single trial. The lower boundary of the deviation (([X], [a])), a 
function of a, which is reached for a = X defines a “typical” position X. More¬ 
over, the value of this (([X], [X])) may be considered as a measure or, at least, 
as a numerical ranging point of the dispersion of X. 

Let us abandon these generalities. They hold especially if the element X 
is a real valued random variable. Among the possible and reasonable 3 expres¬ 
sions for the deviation (([X], [a])) of the numerical variate X from a fixed number 
a, we may use the equiprobable value of | X — a | which may be called the equi- 
probable deviation of X from a. Thus we have, on one side, a new “typical 
value” of X which will be a value of a such that the equiprobable deviation of X 
from a is minimum, and a new measure of dispersion which is the value of this 
minimum and which might be called simply the equiprobable deviation of X. 

In the case where X has everywhere a continuous and finite density of prob¬ 
ability w(X) we find, as typical value, what Gumbel calls the “midvalue” 
and represents by £, and, as equiprobable deviation, what Gumbel calls the 
“probable deviation” and represents by f. 

We may also consider the discontinuous case, which was given as a problem 
to candidates of the “Certificat d’Etudes Supdrieures de Calcul des Probability, 
Option Statistique Mathdmatique, Session May-June, 1944.” They had to 
solve various questions of which I cite the beginning below: 

“Consider n real numbers x\ ^ x 2 ^ ^ x n and represent, by E a , a median 

value of the deviations | x k — a | of the numbers Xk and o. If a varies, E a has 
a minimum E which is reached by one or several values A of o. 

1 ) Explain, in a few words, the meaning of the values E and A. 

2) For simplicity’s sake, suppose that n is odd (n = 2r + 1). How should 
E and A be calculated practically? (To find the answer, investigate first how 
E a varies if a varies only slightly). 

3) In the case where n = 4s + 3 (s is an integer equal to, or larger than, zero) 

show that E ^ - 

where <?i = > <?3 = 

The study of this new typical value and of this new equiprobable deviation 
has the advantage that their determination is very rapid and requires hardly 


8 See the Remark at end of note. 



290 


M. FRlSCHET 


any calculations. However, we have to note an important inferiority of the 
equiprobable deviation of X compared to the mean and the standard deviations 
of X. If one or the other of the last two deviations is zero, X is a fixed number 
(except for the case of the probability zero). This property seems requested 
by the intuitive meaning which we attribute to the dispersion, and to every 
measure or any mark of it. Now, the equiprobable deviation lacks this property. 
If, for instance, X has only three values: 0, 2, 1, the first two with the probability 
0.249, and the last with the probability 0.502, the equiprobable deviation of X 
will be zero, whereas X will be equal to its typical value 1 only with a prob¬ 
ability of 0.502, and not with a probability equal to unity. The same holds 
for any distribution for which there is a point with probability exceeding §. 

Remark . The definitions of the mean and of the equiprobable position become 
meaningless in the case that M(X, a), or M(X, a) 2 , is infinite. However, we 
succeeded in surmounting the difficulty, and to reach definitions which are valid 
even in this case. If X is a number, the new definitions become equivalent to 
the classical definitions of the mean and equiprobable value. The proofs are 
given in two recent articles [5], [6]. 

* REFERENCES 

[1] E. J. Gumbel, “Definition of the probable error,” Annals of Math. Slat , Vol. 13 (1942)* 

p. 110-111. 

[2] E. J. Gumbel, “Probable deviation,” Slat. Jour , City College, (NY), Vol. 6 (1943), pp. 

25-26. 

[31 M. Fr^chet, “ L’intdgrale abstraite d’une fonction abstraite d'une variable abstraite 
et son application & la moyenne d’un dldment aldatoire de nature quelconque,” 
Revue Scienlifique , Vol. 82 (1944), pp. 483-512. 

[4J M. FrIschet, Les Espaces Abstraites, Gautliier-Villars, Paris, 1928, pp. 125-141. 

[5] M. FrIschet, “Les dldments altfatoiies de nature quelconque,” Ann. Inst. //. Poincare , 

1947. 

[6] M. Fr^chet, “Xouvelles ddfinitions de la valeur mo>enne et des valeurs probables 

d’un nombre aldatoire,” Ann. Univ. de Lyon, 1947. 


THE GENERAL RELATION BETWEEN THE MEAN AND THE MODE 
FOR A DISCONTINUOUS VARIATE 

By M. Fr^chet 

Fatuity of Science, University of Paris 

Dr. Gumbel has pointed out that one of the author’s arguments employed in 
several particular cases (see [1]) can be employed in a general case which includes 
them and leads to the following result: If a statistical variate R has only positive 
entire values differing from zero, and if its mean value R is smaller than, or 
equal to, unity, the same holds for its equiprobable value R and its mode R. 
There are two generalizations of this result which might be of interest: 

1) On the one hand, the author has shown [2] that, if a variate R can only 
have values (entire or not) equal to, or larger than, zero, its equiprobable value 



RELATION BETWEEN MEAN AND MODE 


291 


R is, at most, equal to twice its mean value R, and the inequality R/R ^ 2 
cannot be improved which means that the upper boundary of the first member 
is exactly equal to (and not less than) two. The equality is reached when R 
has only two values of equal probability, one of them being zero. 

2) On the other 'hand, if R is an integer positive variate equal to, or larger 
than zero, it can be proven that, if R ^ a, we have 

(i) R ^ a ^ a 3 -. 


Here, R and R stand for the mean and for the mode of R respectively, and ct is 
a positive integer differing from zero. For example: if R is the number of rep¬ 
etitions of an event with probability p, we have, for n trials, R = np , whence, 
if a is the first integer number equal to, or larger than, R we have the inequality 
(1) for the most probable number of repetitions. Naturally, this inequality 
only has an interest if the second member of (1) is smaller than n which means 
that 

a(a -|- 3) <C 2/1 . 


This presupposes 


2 n > np(np + 3) 


r? < 


2-3 p 


and, since n must be positive, 


V < i- 


To prove the inequality (1), let us write to, for the probability that R 
We have 


whence 

( 2 ) 

Let the mode be 

then 


52 = 1 ; 23 V03 v = R S 


23 (a — v)o)y ^ 23 ( v ~~ 
0 a+1 


R = 0 


up oi, ; v = 0, 1, 2, 
and the first member in (2) is bounded by 


0 


(3) 


2 



292 


M. FR^CHET 


Now, either a < or /3 £ a. In the first case the second member in (2) leads to 

00 

(4) (y — a)o) y ^ (fi — a)u)0 

a +1 

since the second member in (4) is one of the terms occurring in the sum. The 
same inequality holds in the second case, 0 ^ a, hence it holds generally. It 
follows from (2), (3), and (4) that 


at (a + 1 ) 
2 


cop ^ (0 — a)co0. 


00 


The probability wp is certainly different from zero, since o>, = 1. 

o 

quently 


0 — at 


£ a ( a + 1 ) 
2 


Conse- 


or 

A ^ a ( a + 3) 

p - 2 

as stated in (1). 

The equality in (1) is possible only if, from (3), 

a(ojft — cjo) + (at — 1) (ojfi — «i) + • • • + (up — w«-i) = 0 
and from (4) 

w«+i + 2o> a +s + • • * + (0 ~ a )«0 + • • • = (/8 — ct)(ap 


whence 

(5) Wo = 0)1 = * * ' =0)13= • • • = 0) a -l 

and 

(5') 0)« + i = 0) a +l = " ’ ‘ = 0. 

The existence of the exceptional case proves that the inequality (1) cannot 
be improved by replacing the second member by a smaller function of a. In 
the exceptional case, the only possible values of R are 


R = 0, 1, 2, •••, a - 1, a, 0, 


and all values, except perhaps a, are equiprobable. The probability o>« may 
be, but need not be, equal to o>£. 

Moreover 


„ _ <*(« + 3) 

0 --o- 


£ a 


( 6 ) 



RELATION BETWEEN MEAN AND MODE 


293 


and 0 = a is possible only if a = f$ = 0 whence, from (5), = 0 except for 

v = 0 which means that R only has one value equal to zero. Except for this 
trivial case, we have in the exceptional case 0 > a, and there are a + 2 possible 
values for R. Then we must have 


whence 


<Ofi ^ U)a J Ci>, + (Op. = 1 

0 

(a + l)wp + (O a = 1 


and, from (5), 

> D V' I a /«(« — 1) , at (a + 3)\ , 

a ^ R = (Op 2-t v + Pup + ow B = a>0 I - - -- + — —- J + 

= a((a + l)w/j + w a ) 

whence 

(7) R = a. 

From 


OG} a 


1 == (a + l)co^ + « a ^ (a + 2)co a 

follows 


( 8 ) 


CO a ^ 


a + 2 * 


OJ/J 


1 — Wa 

a + 1 ' 


These conditions (5), (5'), and (7) are necessary and sufficient for the existence 
of the exceptional case. 

If the equality in (1) is excluded, the mode 0 and the smallest integer number 
a which is equal to, or larger than, the mean, are related by 

(9 ) + = « ± | L±i . 

2 Z 


As shown before, this general inequality, valid for any discontinuous variate, 
which can assume only non-negative integer values, cannot be improved without 
assuming specific properties of the distribution. 


REFERENCES 

[1] M. Fr£chkt, Lea probability associSes & un systbne d'evbnementa compatibles et depen¬ 

dents , Hermann et Cie., 1943, Part II, p. 5. 

(2] M. Fr£chet, “Comparaison de diverses mesures de la dispersion”, Rev. de Vlnst . Inter¬ 

national de Stat. f Vol. 8 (1940), p. 5. 



294 


T. E. HARRIS 


NOTE ON DIFFERENTIATION UNDER THE EXPECTATION SIGN 
IN THE FUNDAMENTAL IDENTITY OF SEQUENTIAL ANALYSIS 


By T. E. Harris 
Princeton University 


Let z be any chance variable and z x , z 2 , z 3 , • • • a sequence of independent 
chance variables, each with the same distribution as z. Let 7j N — z x + 22 + 

+ z N . Let <f>(t) = Ee zt for all complex t for which the latter exists. Let £ 1 , 
$ 2 , • * ■ be a sequence of mutually exclusive events such that S 3 depends only 

00 

on Zi , z 2 , • • •, Zj , and P(S 3 ) = 1. Let the chance variable n be defined 

j-i 

as n = j when Sj occurs. Blackwell and Girshick [1], generalizing a result 
of Wald [2], showed that if there is a positive constant M such that 

(1) \Z N \ < M when n> N 
then the identity 

(2) E\e z ^mr n \ = 1 

holds for all complex t for which <f>(t) exists and | <fr(t) | > 1. Wald [3] estab¬ 
lished conditions, including the existence of <£(/) for all real t , under which 

(2) may be differentiated under the expectation sign an unlimited number 
of times. 

Without assuming the existence of <f>(t) for a real ^-interval the following result 
holds: If (1) is true and if E(z k ) and E(n k ) arc both finite , k a positive integer , 
then 


(3) = 0 

where i = y/ — 1 and s is real. Certain identities, obtained by differentiating 
(2) and pitting t = 0, can also be obtained from (3). For example, if En = 0, 
and if En and Ez both exist then EZ\ = EzEn. 

Let P N = P(n < AT); p N = P(n = N). Let H(j, Z 3 ) and F(N, Z N ) be the 
conditional cumulatives of Z 3 and Z N for n = j and n > N respectively. Now 
(2) was derived by Wald [2], p. 285, from a relation, valid whenever <t>(t) exists, 
which in the present notation becomes 


(4) E Pi f mr'e Zi ‘ <lH(j, Z,) + 

]■>! *I—oo 


(1 - P N ) 
W)Y 


r Y Nt dF(N,Zs) = 1. 

J—00 


Examination of Wald’s derivation of (4) shows it to be valid under the present 
hyj5btheses. Now the finiteness of E(z k ) clearly implies that of E(Z) | n = j). 
Also, since F(N, Z N ) is constant outside the interval [— M, M], the integral 


£ 


Z h N dF(N, Z N ) is finite. 


Hence we may set t — is in (4) and differentiate 



A UNIQUENESS THEOREM 


295 


k times, obtaining for all real s 


(5) 


§ Vi £ dHU, z,) 


+ 


(1 - Ps) g Q ^ [(0 (m))- w J • l* (iZ„) k - r e z “" dF(N, Z N ) = 0. 


The derivatives of (<t>(is))~ N are sums of terms of the form Q(N) • (<f>(is))~ N ~ r 
times terms independent of AT, where Q(N) is a polynomial in N of degree < k. 
For any r < k, 


lim | (1 — P N ) N 1, | = lim 

N-*ao JV—*oo 


N' E p, 

< lim 

oo 

E 3* Pi 

J-A+l 

N -* oo 

J-W+l 


= 0 , 


since En k is finite. Hence lim (1 — P n )Q(N) = 0. Because of (1) the inte¬ 
grals in the second term of (5) are bounded as A r —* °o. Now set s = 0 in (5) 
and then let N —> <». Since </>(0) = 1, the second term of (5) approaches 0 
and the limit of the first term is just the left side of (3). 

For the case of a Wald sequential process, Stein [4J has shown that all moments 
of n are finite. In this case (3) holds whenever Ez k is finite. 


REFERENCES 

[1] David Blackwell and M. A. Girsuick, “On functions of sequences of independent 

chance vectors, with applications to the problem of the random walk in k di¬ 
mensions,” Annals of Math. Stat Vol. 17 (1946), p. 310. 

[2] Abraham Wald, “On cumulative sums of random variables,” Annals of Math. Stat., Vol. 

15 (1944), p. 283. 

[31 Abraham Wald, “Differentiation under the expectation sign in the fundamental iden¬ 
tity of sequential analysis,” Annals of Math. Stat., Vol. 17 (1946), p. 493. 

[4] Charles Stein, “A note on cumulative sums,” Annals of Math. Stat., Vol. 17 (1946), p. 
498 


A UNIQUENESS THEOREM FOR UNBIASED SEQUENTIAL 
BINOMIAL ESTIMATION 

By L. J. Savage 1 
University of Chicago 

In a recent note [1], J. Wolfowitz extended some of the results of a paper by 
Girshick, Mosteller and Savage [2] on sequential binomial estimation. The 
present note carries one of Wolfowitz's ideas somewhat further. The nomen¬ 
clature of [1] and [2] will be used freely. The concept of “doubly simple region” 
introduced in [1] and assumed there only in the hypothesis of Theorem 3, will 
here be shown to be unnecessarily restrictive. In so doing, we find that sim- 


x The author is a Rockefeller fellow at the Institute of Radiobiology and Biophysics, 
University of Chicago. 



296 


L. J. SAVAGE 


plicity is not only a necessary (cf. Theorem 4 of [2]) but also a sufficient condi¬ 
tion that p be the unique unbiased estimate of p for a closed region. 

Lemma. If R is simple there is at most one bounded unbiased estimate of any 
given function of p. 

Proof. If the lemma were false, there would be a non-trivial bounded un¬ 
biased estimate of zero, i.e., m(ct) such that | m(a) | is bounded by a constant 
ra*, m{a) not identically zero and E(m(a) | p) s 0. 

(1) E(m(ot) | p) = X m{a)k{a)p v q x = 0. 


and m(a) not identically zero. Since R is simple we may assume (much as in 
the proof of Theorem 6 of [2]) that we have a boundary point such that 
m(a o) 5^ 0, «o is below all accessible points of its own index and also below 
every other a for which m{a) j* 0. Therefore 

(2) |m(«o) | k(a 0 )p vo q T ° = | ]£ m(a)k(ot)p v q x | < m* Z) k(a)p v q x . 

v>vo 0 

Let M denote the set of all accessible points and boundary points at which 
x < x 0 and y = y 0 + 1. There arc at most x 0 points in M , say ft , • • • , ft . 
Considering the way in which a 0 has been chosen, every path from (0, 0) to an a 
for which y > y 0 passes through or to at least one point of M. Therefore when 

y > yo 

P(a I M)P(M) 

P(a I M) £ /„■(&)p *“ +1 <r- 
1 

P V0+I E Kt3,)P( a I .1/). 

1 


(3) 


P(a) = k(oi)p v q x = 
< 


From inequalities (2) and (3). 


(4) 


j m(««) | fc(<*o)p*V 0 < rn*p va+l {e E \ M) 

L 1 J V>Vo 

< m*p vo+1 E &(&)• 


But it is impossible that (4) should be satisfied for small p. 

Combining the Lemma with Theorem 4 of [2] we have the 

Theorem. A necessary and sufficient condition that p(a) be the unique proper 
(i bounded ) and unbiased estimate of p for a closed region R is that R be simple . 

The sufficiency part of this Theorem extends Theorem 3 of [1] from doubly 
simple regions to simple regions. 

The author is indebted to J. Wolfowitz for his valuable suggestions in connec¬ 
tion with the present note. 



ACKNOWLEDGMENT OP PRIORITY 


297 


REFERENCES 

11] J. Wolfowitz, “On sequential binomial estimation,” Annals 0 / Math. Stat., Vol. 17 
(1946), pp. 489493. 

[2] M: A. Girbhick, Frederick Mosteller, and L. J. Savage, “Unbiased estimates ior 
certain binomial sampling problems with applications.” Annals of Math. Stat., 
Vol. 17 (1946), pp. 13-23. 


ACKNOWLEDGEMENT OF PRIORITY 

By H. E. Robbins 
University of North Carolina 

At the time of publication of my papers on the measure of a random set 
(.Annals of Math. Stat., Vol. 15 (1944), pp. 70-74; Vol. 16 (1945), pp. 342- 
347), I was unaware that the theorem on page 72 of the first paper, which 
affords a means of computing the expected value of the measure, had already 
been found by A. Kolmogoroff. (Grundbegrifle der WahrscheinLichkeitsrech- 
nung, Ergebnisse der Mathematik, Berlin, 1933, p. 41). I wish to take this 
opportunity of acknowledging Kolmogoroff’s priority, which was pointed out 
by Prof. Henry ScheffA 



ABSTRACTS OF PAPERS 

Presented on January 25, 1947, at the Atlantic City meeting of the Institute. 

1. A Test of Significance of the Coefficient of Rank Correlation for more than 
Thirty Ranked Items. Nilan Norris, Hunter College. 

Hotelling and Pabst ( Annals of Math . Stat ., Vol. 7 (1936), p. 37) have suggested the use 
of the Tchebycheff inequality as an approximation for testing the significance of the co¬ 
efficient of rank correlation in cases where the number of ranked items is too large to enable 
exact probabilities to be computed directly. A table prepared in accordance with this 
suggestion indicates that for values of the coefficient of rank correlation larger than .50 
there is a wide range of corresponding numbers of ranked items greater than thirty for 
which at least the five per cent level of significance is satisfied. 

For certain types of applications the conservativeness of the Tchebychoff test may be 
a virtue rather than a limitation. 

2. A Generalized T Measure of Multivariate Dispersion. Harold Hotelling, 
University of North Carolina. 

The problem of combining errors in two or more dimensions to measure the accuracy of 
firing and bombing is similar to problems occurring in industrial quality control where 
different measures of quality are applied to the same article, and to problems in mental 
testing and other fields. If the covariances were known a priori, the solution optimum 
in certain senses, for a multivariate normal distribution, would be the use of x 2 = / , 

where [Xt,] -1 is the covariance matrix and Xi is the deviation in the ith dimension. Since 
the covariances must in all known practical cases bo estimated from a preliminary sample 
with (say) n degrees of freedom, x 2 may be replaced by T 1 = S XlijXiXj , where [Z,*/]"* 1 is 
the estimated covariance matrix. This is the same T introduced by the author in 1931 
as a generalization of the Student ratio t, and has the same distribution. Upon adding 
together the values of T 2 for different cases (e.g. for different bombs dropped with the same 
bombsight), a combined measure Tfi of over-all excellence (e.g. of the bombsight), is ob¬ 
tained. To like x 2 > can be broken down into components meaningful with respect to the 
causal system, specifically in relation to possible sources of excessive discrepancy. Thus, 
if ii is the *h coordinate of the centroid, or mean point of impact, of m bombs, we may 
write T 2 m = 22lijx t Xj , Tj =* Tj — Th . Then T D is a function only of deviations from 
the mean point of impact. Asymptotically (for large n), To, Tm and Td have the x dis¬ 
tribution with m, 2 and m — 2 degrees of freedom respectively. But the untrustworthiness 
of the x distribution as an approximation is evident even with n as large as 256, for which 
case calculations have been made. The exact distributions of To and Td are ascertained 
when the number of variates p is 2, and the probability integrals are expressed as linear 
functions of two incomplete beta functions. In fact, T\/M equals the sum of the roots 
of a determinantal equation of the form | A — X0 | = 0, where A and B are sample covariance 
matrices with to and m degrees of freedom respectively, and a similar relation holds for T% 
with m replaced by m — 2. To and Tm have the distribution published in 1931, with prob¬ 
ability integral expressible in *tcrms of a single incomplete beta function or the variance 
ratio distribution. It is shown that such parameters as the circular mean deviation are 
best estimated with the help of the T measures, not directly by averaging individual cir¬ 
cular deviations. 

3. Asymptotic Properties of Maximum and Quasi-Maximum Likelihood Esti¬ 
mates. Herman Rubin, Cowles Commission for Research in Economics. 

298 



ABSTRACTS OF PAPERS 


299 


The results of J. L. Doob ( Trains. Am. Math. Soc., Vol. 36 (1934), pp. 759-775) oncon- 
sistency of maximum likelihood estimates, are generalized and extended to arbitrary mea¬ 
sure spaces. In some special cases, results on asymptotic normality of maximum likeli¬ 
hood estimates can be generalized to quasi-maximum likelihood estimates (estimates based 
on the assumption of a likelihood function which need not be the true function^. 

4. The Asymptotic Distribution of the Range. E. J. Gumbel, Newark College 
of Engineering. 

The asymptotic distribution of the range w for initial unlimited distributions of the 
exponential type is obtained by convolution of the asymptotic distributions of the two 
extremes. Let a and v be the parameters of the distributions of the extremes for a sym¬ 
metrical variate, and let R « a(w — 2 u) be the reduced range. Then the probability 
^(R) of the reduced range is subject to the differential equation -f * exp ( —R ) = 0 
which may be transformed into Bessel’s equation of the first order by the substitutions 
R — 2(log2 — log z),and'k = zU The solution is ^(72) = zKi(z) for the asymptotic prob¬ 
ability, and yp{R) = (z 2 /2)K 0 (z) for the asymptotic distribution, K 0 (z) and K](z) being the 
modified Bessel function of the second kind of orders zero and unity. Thus tables of ¥( R ) 
and \f/(R) may be calculated for any symmetrical distribution of the exponential type. 
The distribution of the range w for normal samples of size 10 is already very close to the 
asymptotic distribution provided that the parameters a and u are determined from the 
mean and the standard deviation of the range. This method permits the calculation of 
the distribution of the range for normal samples of any size larger than 10. 

5. The Comer Test for Association. John W. Tukey, Princeton University, 
and Paul 8. Olmstfad, Bell Telephone Laboratories. 

Construction. In a scatter diagram, draw the two medians, that is, the median of the 
x values without regard to the values of y , and the median of the y values without regard 
to the values of x. Think of the four quadrants thus formed as being labelled +, —, -f, — 
in order, so that the two positive quadrants lie along one diagonal and the two negative 
along the other. Beginning at the right-hand side of the diagram, count in along the ob¬ 
servations until forced to cross the horizontal median. Write down the number of ob¬ 
servations met before this crossing, attaching the sign, +, if they lay in the + quadrant, 
and the sign, —, if they lay in the — quadrant. Repeat this process, moving up from 
below’, moving to the right from the left, and moving down from above. The quantity to 
be used in the test is the algebraic sum of the four numbers thus written dowm. 

Distribution. The exact distribution of this quantity when no association is present 
and no two x’s and no two y’s arc alike is almost independent of sample size over the range 
of values where it is apt to be used. For example, a sum of 9 or more is expected less than 
one time in ten for all samples of size 6 or more; a sum of 15 or more, less than one time in 
100 for all samples of size 10 or more; and a sum of 21 or more, less than one time in 1000 
for all samples of size 14 or more. Even for infinite sample size, the sums for these fractions 
become only 9, 14, and 19, respectively. 

Extensions. The same ideas that underlie the outside corner test for two variables 
may be extended in several ways to give tests for various types of association among three 
or more variables. 

6. Consistent Estimates Based on Partially Consistent Observations, with 

Particular Reference to Structural Relations. J, Neyman and Elizabeth 
L. Scott, University of California. 



300 


ABSTRACTS OF PAPERS 


Let {Xn} be a sequence of independent random variables and let F t denote the distribu¬ 
tion of X % . Each distribution F % is assumed to depend on unknown parameters. If a 
parameter 0 appears in an infinity of distributions Ft , it is called structural . Otherwise, 
it is incidental . The sequence (Xn) is called consistent if {F*} has no incidental parameters. 
(X») is called partially consistent if {F*) has both structural and incidental parameters.— 
Problem of fitting a straight line when both variables are subject to errors is that of a 
partially consistent series of observations. Let £ and 17 *■ a 4* 0 £ be two linearly connected 
quantities, perhaps related to particular stars, where a and 0 are unknown. The values 
£< and ij % corresponding to the *‘th star, (i — 1 , 2 , • • • ,«), are unknown. The observations 
provide measurements x%j of £,, (j - 1 , 2 , ••• , m»), and measurements y**, (k — 
1 , 2 , • • • , n»), of if* . Both m % and n % are bounded and small. On the other hand, s may be 
considered as increasing without limit.—Assume that the x tJ and the y%k are normally 
distributed with variances <rj and c\ and means £» and rj t respectively. Then the totality 
of observations will form a partially consistent system with the structural parameters a, 0 , 
9\ and «r 2 and with £, as incidental parameters.—If the observable random variables are only 
partially consistent, then the maximum likelihood estimates of the structural parameters 
(a) need not be consistent, (b) even if they are consistent and asymptotically normal, 
alternative estimates may exist which have the same properties but smaller asymptotic 
variances.—Consistent estimates of structural parameters may be obtained from “modi¬ 
fied” equations of maximum likelihood. The lower bound of the variance of estimates of 
structural parameters, provided by the Cramdr-Rao inequality, is attained only on certain 
conditions which are both necessary and sufficient. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Paul H. Anderson has been appointed Economic Analyst with the Market¬ 
ing Division, Office of Domestic Commerce, Department of Commerce, Wash¬ 
ington. 

Dr. Gilbert W. Eeebe is now with the Division of Medical Sciences, National 
Research Council, Washington. 

Professor Harald Cram6r, Director of the Institute of Mathematical Statistics 
of the University of Stockholm, was awarded the degree of Doctor of Science, 
honoris causa , by Princeton University on February 22,1947. Professor Cram6r 
has acted as Visiting Professor of Mathematics at Princeton University and 
Yale University during the academic year 1946— , 47. He will be at the Univer¬ 
sity of California at Berkeley during the 1947 Summer Session. 

Dr. Paul M. Densen has accepted a position with the Division of Medical 
Research Statistics, Bureau of Medicine and Surgery, Veterans Administra¬ 
tion, Washington. 

Mr. M. V. Divatia is now in charge of the office of the Statistician and Eco¬ 
nomic Adviser and Under-Secretary to the Government of Sind, Karachi, 
India. 

Mr. Clarence B. Fine, formerly with the Office of Price Administration, has 
transferred to the Bureau of Old-Age and Survivors Insurance, Social Security 
Administration, where he is employed as a Sampling Expert. 

Prof. Charles C. Grove was appointed Visiting Lecturer in Mathematics at 
the University of Pennsylvania for the spring semester. 

Assoc. Prof. E. E. Haskins of Northeastern University has been appointed to 
an assistant professorship at the Army Air Forces Institute of Technology, 
Wright Field, Dayton, Ohio. 

Prof. Roger Lessard of the Hull Technical School has accepted a position at 
the Ecole Polytechnique, Montreal. 

Mr. Edward D. Lowery is now a member of the Research Department* Win¬ 
chester Arms Company, New Haven, Connecticut. 

Professor H. B. Mann of Ohio State University has been awarded the Frank 
Nelson Cole prize in the Theory of Numbers for 1946. 

Dr. Margaret P. Martin has been appointed to an assistant professorship in 
the Department of Preventive Medicine and Public Health, Vanderbilt Uni¬ 
versity Medical School, Nashville, Tennessee. 

Dr. A. L. 0Toole is at present employed by the Veterans Administration in 
the Washington headquarters, as Acting Chief of the Administrative Analysis 
Division in the Research Service. Dr. O’Toole was released from the Navy on 
September 23,1946, to inactive duty in the U. S. Naval Reserve, with the rank 

301 



302 


NEWS AND NOTICES 


of Commander. Dr. O’Toole served for nearly four years in the Xavy, in 
important administrative and statistical work for the Commander South Pacific 
Area and South Pacific Force. He will be remembered as having been with 
Admiral Halsey’s Pacific Fleet, and was awarded the Bronze Star Medal. At 
the time of his release, he was Chief Staff Officer for Commander South Pacific 
Area and South Pacific Force. 

Mr. I. B. Perrott, since his demobilization from the British Army, has been 
Lecturer in Mathematics at the College of Technology and Commerce, Leicester, 
England. 

Mr. J. S. Ripandelli is now with the Actuarial Department of the Jefferson 
Standard Life Insurance Company of Greensboro, North Carolina. 

Dr. Ronald W. Shephard of the University of California has been appointed 
to the staff of the Department of Mathematics, New York University. 

Mr. John R. Stehn is now a member of the Research Laboratory of the Gdh- 
eral Electric Company, Schenectady, New York. 

Dr. Charles W. Vickery, formerly of Ohio State University, is engaged in work 
as a Research Consultant in New York City. 


Miss Margaret Jeannin Dix, of the University of California Statistical Labora¬ 
tory, died an accidental death at her home in Berkeley on June 20, 1946. 

Mr. Albert M. Freeman, of the Boston Fiduciary and Research Association, 
died May 20, 1946. 

Dr. Walter Schilling, of the Stanford University Hospital, died suddenly in 
San Francisco, December 16, 1946. 


Summer Statistical Session at the University of California at Berkeley 

The important advances in the theory of statistics during the war and espe¬ 
cially the unprecedented growth in the fields of application have created a 
strong demand for trained statisticians to fill both the research and the teaching 
positions all over the country. Since in many cases the war time education had 
to be somewhat sketchy, unsystematic, and not very conducive to a thorough 
coverage of the vast material, it is felt that a relatively brief set of courses on a 
rather advanced level would be beneficial to many persons, both those who al¬ 
ready hold research or teaching positions in statistics, as well as those who 
prepare for higher degrees. 

With this object in mifid, the University of California at Berkeley is offering 
a set of statistical courses during the Summer Session, June 23rd to August 2nd, 
1947. There will be three courses: (i) General Theory of Random Variables and 
Frequency Distributions, by Harald Cram6r of the University of Stockholm; 



NEWS AND NOTICES 


303 


(ii) Problems of Testing Hypotheses and of Estimation, by J. Neyman, Univer¬ 
sity of California, Berkeley; and (iii) Seminar Course. The last will be given by 
seven scholars, each giving tw o hours of lectures, as follows: 

1 . Statistical Astronomy. 

2 Orthogonal Polynomials and Problems of Moments. 

3. Methods of Calculation. 

(a) Gibbs’ Methods in Statistical Mechanics 

j (b) Darwin-Fowler Method of Statistics. 

Large Scale Sampling Surveys. 

5. Statistical Problems Arising in Nuclear Physics 
Measurements 

6 . Problems of Population Genetics. 

7. Interactions between Industrial Problems and Mathematical 
Statistics. 

The purpose of the Seminar Course is to introduce the students either to 
branches of pure mathematics contingent on mathematical statistics but not 
ordinarily taught in the universities or to various fields of knowledge offering 
fruitful fields for statistical studies. 


Summer Statistical Session at Virginia Polytechnic Institute 

A Summer Statistical Session will be held at Virginia Polytechnic Institute, 
Blacksburg, Virginia, August 5 to September 5, 1947. This Session will be 
sponsored jointly by Virginia Polytechnic Institute, University of North Caro¬ 
lina, University of Michigan, Iowa State College, and the Federal Bureau of 
Agricultural Economics. 

The faculty will consist of; Walter A. Hendricks, B.A.E., U.S.D.A.; Renis 
Likert, University of Michigan; H. L. Lucas, University of North Carolina; 
Maurice G. Kendall, England; George W. Snedeeor, Iowa State College; Frank 
Yates, Rothamsted PNperiment Station, England; Earl E. Houseman, B.A.E., 
U.S.D.A.; Raymond J. Jessen, Iowa State College, and Boyd Harshbarger, 
Virginia Polytechnic Institute. 

The following courses will be offered for credit: Engineering Statistics; Sta¬ 
tistical Methods; Design of Animal Experiments; Schedule Design and Interview 
Techniques for Sample Surveys; Sampling Design and Analysis; Mathematical 
Theory of Sampling; Seminar; Mathematical Statistics, and Experimental 
Design. 

In addition to the faculty, probable Seminar speakers are: W. F. Callendar, 
W. Q. Cochran, Miss Gertrude M. Cox, W. E. Deming, George Gallup, M. H. 
Hansen, Harold Hotelling, Arnold King, and Charles F. Sarle. 

Inquiries regarding the Summer Session should be addressed to Boyd Harsh¬ 
barger, Professor of Statistics, Summer Statistical Session, Virginia Polytechnic 
Institute, Blacksburg, Virginia. 


It. J. Tritmpler 
G. Szeg6 
V. F. Lenzen 


P. C. Mahalanobir 

R. Serber 

S. Emerson 
H. Scheff£ 



304 


NEWS AND NOTICES 


New Members 

The following persons have been elected to membership in the Institute 
(January 1 to February 28, 1947): 

Asofsky, Samuel, B.S. (C.C.N.Y.) Stat., National Jewish Welfare Board, 1256 E. IS St., 
Brooklyn SO, N. Y. 

Auer, Richard M., A.M. (Columbia) Instr. in Math., State Teachers Coll., Montclair, 
N. J., 88 No. 16 St., East Orange 

Bakan, David, M.A. (Indiana) Chief Stat., Comm, on Selection and Training of Aircraft 
Pilots, National Research Council, 259 Natatorium, Ohio State Univ., Columbus 10, 
Ohio 

Beatty, Glenn H., A.B. (Ohio State) Grad, student and Fellow, Iowa State College, Station 
A, General Delivery, Ames, Iowa 

Campbell, Wallace A., B.S. (Columbia) Stat .Analyst, War Assets Administration, 483 
Washington Ave., Brooklyn 16, N. Y. 

Celia, Francis R., M.A. (Kentucky) Assoc. Prof, of Statistics and Director, Bur. of Busi¬ 
ness Research, Univ. of Oklahoma, Norman, Okla. 

Chapman, Douglas G., M.A. (Toronto) Asst. Prof, of Math., Univ. of British Columbia, 
Vancouver, Canada 

Cheydleur, Benjamin F., B.A. (Wisconsin) Chief, Mechanized Analysis, Naval Ordnance 
Lab., 602 Avenue E, District Heights, Washington 19, D. C. 

Coombs, Clyde H., Ph.D. (Chicago) Ass't Prof, of Psychology, and Research Psychologist, 
Institute for Human Adjustment, Univ. of Michigan, Ann Arbor, Mich., 1027 E. 
Huron 

Corton, Edward L., Jr., M.B.A. (Chicago) Grad, student, Iowa State Coll., 80S Hodge 
Ave., Ames, Iowa 

Davis, Harold., A.B. (Brooklyn Coll.) Stat., Navy Dept., 416—88 St., S.E., Washington, 
D. C. 

Dutton, Arthur M., B.S.E.E. (Iowa State) Grad. Fellow, Mathematics Dept., Iowa State 
Coll., Ames, Iowa 

Fay, Edward A., A.M. (Harvard) Grad, student, Univ. of California, Berkeley, 415 South 
17th St., Apt. SB, Richmond, Calif. 

Flanagan, John C., Ph.D. (Harvard) Prof, of Psychology, Univ. of Pittsburgh, Pitts¬ 
burgh 13, Pa. 

Gardner, Eric F., Ed.M. (Boston Teachers) Teaching Fellow and Milton Fellow, Grad. 
School of Educ., Harvard Univ., Cambridge, Mass., Walker Home, 40 Quincy St. 

Gerende, Lincoln J., C.Ph.M., U. S. Navy, Naval Medical Res. Institute, National Naval 
Medical Center, Bethesda 14, Md. 

Grossman, Evelyn, M.A. (Columbia) Stat., U. S. Dept, of Agriculture, 6401—14 St., 
N. W., Washington 12, D. C. 

Hill, Edwin A., Jr., M.A. (Columbia) Instr. in Math., Coll, of the City of N. Y., 50 West 
67 St., New York 28, N. Y. 

Horton, H. Burke, M.B.A. (Texas) Senior Transport Analyst, 2906 Naylor Rd., S. E., 
Washington 20, D. C. 

Horvitz, Daniel G., B.S. (Mass. State) Grad, student, Iowa State Coll., 21S7 Country Club 
Blvd., Ames, Iowa 

Ikhtiar-ul«Mulk, S. M., M.A. (Punjab, India) Grad, student, Princeton Univ., Graduate 
College, Princeton, N. f. 

Jaeger, Carol M., B.A. (Dubuque) Statistician, 1800 Columbia Terrace, Peoria 5, III. 

Jeasen, Raymond J., Ph.D. (Iowa State) Res. Assoc. Prof., Iowa State College, and 
Agric. Statistician, U.S.D.A., Statistical Lab., Iowa State Coll., Ames, Iowa 

Kinzer, Mrs. Lydia Greene, M.A. (Kansas) Ass’t Instr. in Math., Ohio State Univ., 
585 East Town Street , Columbus 15, Ohio 



NEWS AND NOTICES 


305 


Langenhop, Carl £., M.S. (Iowa State) Instr. in Math., Iowa State Coll., Apt. 3, Cranford 
Annex , Ames, Iowa 

Lowy, Melltta £., A.B. (Hunter) Statistician, Grad, student, Columbia Univ., 646 West 
End Ave., New York 25, N. Y. 

Mattila, Sakari, Fil.Mag. (Helsinki) High School of Commerce, Helsinki, Finland 

Mayerson, Allen L., fi.S. (Michigan) Grad, student and Teaching Fellow, Univ. of Mich., 
1S02 Packard St., Ann Arbor, Mich. 

McCreary, Garnet £., M.A. (Queen’s Univ.) Research Fellow, Statistical Lab., Iowa 
State Coll., Ames, Iowa 

McMillan, Olan T., M.A. (Michigan) Instr. in Math., Michigan State Coll., East Lansing, 
Mich. 

Morris, Edward B., A.B. (Indiana) Statistician, U. S. Bur. of Labor Statistics, 1916 Ridge 
Place S. E., Washington 20, D. C. 

Moshman, Jack, B.A. (New York) Tutor in Math., Queens Coll., Flushing, N. Y., 126-09 
Liberty Ave., Richmond Hill 19 

Natrella, Mrs. Mary G., B.A. (Pennsylvania) Statistician, Bureau of Ships, Navy Dept., 
1210—12th St., N . W. Washington 5, D. C. 

Neal, T. Ellison, A.B. (Geo. Washington) Statistician, Textile Dev. Dept., U. S. Rubber 
Co., Hogansville, Ga. 

Noble, Carl E., Ph.D. (Iowa) Quality Methods Engineer, Kimberly Clark Corp., Lake- 
view Mill, Neenah, Wis. 

Ostle, Bernard, M.A. (British Columbia) Teaching Ass’t, School of Bus. Adm., Univ. of 
Minnesota, Minneapolis, Minn. 

Oxtoby, Toby B., B.A. (Iowa) Grad. Ass’t, Dept, of Psychology, State Univ. of Iowa, 
Iowa City, Iowa 

Pelsakoff, Melvin P., Student, Princeton Univ., S4 North West College, Princeton, N. J. 

Rothschild, Colette, (Ecole Normale Superieure) Attachee de Recherches au Centre Na¬ 
tional de la Recherche Scientifique, 46 rue Madame, Paris VI*, France 

Slonlm, Morris J., M.B.A. (Harvard) Statistician, Bureau of Labor Statistics, 210 Wayne 
Place S. E., Washington 20, D. C. 

Soler, Reuben I., B.B.A. (C.C.N.Y.) Statistician, Food and Drug Administration, 246 
Portland St., S. E., Washington, D. C. 

Stouffer, Samuel A., Ph.D. (Chicago) Prof, of Sociology and Director of the Laboratory 
of Social Relations, Emerson Hall, Harvard Univ., Cambridge, Mass. 

Teicher, Henry, B.A. (Iowa) Graduate student, Columbia Univ., 139 Osborne Terrace , 
Newark, N. J . 

Tledeman, David V., M.A. (Rochester) Instr. in Educ., Grad. School of Educ., Harvard 
Univ., Walker House, 40 Quincy St., Cambridge 38, Mass. 

Tintner, Gerhard, Ph.D. (Vienna) Prof, of Economics and Mathematics, Iowa State 
Coll., Ames, Iowa 

Weiss, Eleanor S., Ed.M. (Boston Teachers) Teaching Fellow, Grad. School of Educ., 
Harvard Univ., 2006 Commonwealth Ave., Brighton 36, Mass. 

Wilson, William A., Jr., A.B. (California) Teaching Ass’t in Psychology, Univ. of Calif., 
Berkeley 4, Calif. 

Woodell, Allan D„ A.B. (N. Y. State Teachers, Albany) Graduate student in math., Univ. 
of Mich., 426 Church St., Ann Arbor, Mich . 


Omitted from 1946 lists of new members: 

Feraud, Prof. Luclen, Faculte des Sciences Economiques et Sociales, Univ. de Geneve, 
24 rue Henri Mussard, Qenkve, Switzerland 



REPORT ON THE ATLANTIC CITY MEETING OF THE INSTITUTE 

The Ninth Annual Meeting of the Institute of Mathematical Statistics was 
held at Atlantic City, New Jersey, on Friday and Saturday, January 24-25,1947. 
The meeting was held in conjunction with meetings of the American Economic 
Association, American Statistical Association, and the Econometric Society. 
The following 154 members of the Institute attended the meeting: 

Beatrice Aitchison, F. L. Alt, It. L. Anderson, T. W. Anderson, K. J. Arrow, Max Astra- 
chan, B. M. Bennett, Joseph Berkson, A. J. Berman, C. I. Bliss, Paul Boschan, A. E. 
Brandt, M. F. Bresnahan, Philip Brown, 0. P. Bruno, R. W. Burgess, 0. K. Buros, B. H. 
Camp, F. R. Celia, Uttam Chand, K. L. Chung, C. W. Churchman, P. C. Clifford, W. J. 
Cobb, W. G. Cochran, F. G. Cornell, D. R. Cowan, Harald Cramdr, J. H. Curtiss, J. F. Daly, 
G. B. Dantzig, D. G. Deihl, D. B. DeLury, B. W. Dempsey, H. F. Dorn, F. W. Dresch, 
A. J. Duncan, David Durand, P. 8 . Dwyer, Churchill Eisenhart, W. D. Evans, Will Feller, 

C. D. Ferris, Irving Fisher, L. R. Frankel, M. A. Geisler, Leon Gilford, M. A. Girshick, 

C. H. Graves, K. E. Greene, 8 . W. Greenhouse, F. E. Grubbs, E. T. Gumbel, Margaret 
Gurney, Louis Guttman, Trygve Haavelmo, K. W. Halbert, M. H. Hansen, Miriam S. 
Harold, T. E. Harris, Boyd Harshbarger, Bernard Hecht, Wassily Hoeffding, H. B. Horton, 
Harold Hotelling, E. E. Houseman, Helen M. Humes, Leonid Hurwicz, Seymour Jablon, 
R. W. James, R. J. Jessen, H. L. Jones, Alice S. Kaitz, H. B. Kaitz, L. S. Kellogg, H. S. 
Konijn, Tjalling Koopmans, C. F. Kossack, R. L. Kozelka, D. H. Leavens, Howard Levene, 
J. E. Lieberman, Rensis Likert, S. B. Littauer, Irving Lorge, P. J. McCarthy, P. W. Me- 
Gann, F. E. McIntyre, H. F. MacNeish, J. D. Maddrill, Jacob Marschak, Max Millikan, 

A. M. Mood, Mrs. Margaret Moore, J. W. Morse, J. E. Morton, Frederick Mosteller, D. N. 
Nanda, P. M. Neurath, Jerzy Neyman, M. L. Norden, Nilan Norris, II. W. Norton, P. 8 . 
Olmstead, E. G. Olds, Sophie Rakesky, Chester Rapkin, Olav Reiersol, W. A. Reynolds, 
P. R. Rider, C. F. Roos, A. C. ltosander, Ernest Rubin, Herman Rubin, P. J. Rulon, Frank 
Saidel, Marion M. Sandomire, Max Sasuly, F. K. Satterthwaite, E. D. Schell, E. M. Schrock, 

D. H. Schwartz, G. R. Seth, L. W. Shaw, W. A. Shewhart, J. II. Smith, R. T. Smith, Leslie 

E. Simon, Milton Sobel, C. M. Stein, G. T. Steinberg, Joseph Steinberg, II. W. Steinhaus, 

F. F. Stephan, A. P. Stergion, M. S. Stevens, G. J. Stigler, S. A. Stouffer, Zenon Szatrowski, 

B. J. Tepping, J. W. Tukey, D. F. Votaw, Jr., ifelen M. Walker, J. H. Watkins, Louis 
Weiner, Samuel Weiss, S. S. Wilks, Elizabeth W. Wilson, 0. P. Winsor, J. Wolfowitz, M. A. 
Woodbury, Holbrook Working, C. A. Wright, and T. O. Yntema. 

The first session, a joint session with the Econometric Society and the Bio¬ 
metrics Section of the American Statistical Association, was held at two o’clock 
on Friday afternoon, and was devoted to the topic, Applications of Statistical 
Techniques to Agricultural Economics. Holbrook Working of Stanford Uni¬ 
versity presided. The following four papers were presented: 

1 . Use of Variance Components in the Analysis of Market Differentials in Hog Prices. 
R. L. Anderson, University of North Carolina. 

2 . An Application of the Analysis of Variance in the Economic Evaluation of Production. 
Boyd Harshbarger, Virginia Polytechnic Institute. 

3. A Model of the Economic Interdependence between Agriculture and the National Economy . 
Trygve Haavelmo, Cowles Commission for Research in Economics. 

4. The Reduced-Form Method for Estimating Simultaneous Economic Relationships. 
M. A. Girschick, Bureau of the Census. 

306 



KEPORT OX ATLANTIC CITY MEETING 


307 


The session concluded with a discussion of these papers by T. W. Anderson, 
Columbia University; Milton Friedman, University of Chicago; and, Harold 
Hotelling, University of North Carolina. 

At 8 o’clock on Friday evening there was a joint session with the Econometric 
Society and the American Statistical Association, on the topic, When is the 
Analysis of Variance Useful in Economic Researchf Arthur R. Tebbutt of 
Northwestern University presided, and the following three papers were presented: 

1 . The Advantages of the Analysis of Variance for Research and Managerial Control 
Purposes. Harry Pelle Hartkemeier, University of Missouri. 

2 . Estimation of Economic Relationships and Multivariate Regression. 

Leonid Hurwicz, Iowa State College. 

3. Nonstandard Forms of Variance Analysis. 

W. Allen Wallis, University of Chicago. 

There was discussion of these papers by Tjalling Koopmans, Cowles Commission 
for Research in Economics: Gerhard Tintner, Iowa State College; and, J. W. 
Tukey, Princeton University. 

At 10 o’clock on Saturday morning there was a joint session with the American 
Statistical Association devoted to the topic, Use of Ordered Observations in 
Statistical Analysis , with Harold Hotelling of the University of North Carolina 
as chairman. The following two papers were presented: 

1. Estimation of Parameters by Use of Order Statistics. 

Frederick Mosteller, Harvard University. 

2. Tolerance Limits. 

Jacob Wolfowitz, Columbia University. 

There was discussion of these papers by John H. Smith, Bureau of Labor Sta¬ 
tistics; Howard L. Jones, Illinois Bell Telephone Company; and J. W. Tukey, 
Princeton University. 

At the Saturday morning session one contributed paper of the Institute of 
Mathematical Statistics was also presented, by E. J. Gumbel, Newark College 
of Engineering, on the topic: The Asymptotic Distribution of the Range. 

The Institute’s session at 2 o’clock Saturday afternoon was devoted to con¬ 
tributed papers. W. G. Cochran, president of the Institute, presided, and the 
following four paper's were presented: 

1 . A Test of Significance of the Coefficient of Rank Correlation for More than Thirty Ranked 
Items. 

Nilan Norris, Hunter College. 

2 . A Generalized T Measure of Multivariate Dispersion. 

Harold Hotelling, University of North Carolina. 

3 . Asymptotic Properties of Maximum and Quasi-Maximum Likelihood Estimates. 
Herman Rubin, Cowles Commission for Research in Economics. 

4. 7'he Corner Test for Association. 

J. W. Tukey, Princeton University, and Paul Olmstead, Bell Telephone Laboratories. 



REPORT ON ATLANTIC CITY MEETING 


Abstracts of these papers appear elsewhere in this issue. 

Following the session on contributed papers, Professor Jerzy Neyman of the 
University of California gave an invited address on the topic: On Consistent 
Estimates , with Particular Reference to Structural Relatione between Several Vari¬ 
ables aU Subject to Random Error. A discussion of this address followed, by 
Miss E. L. Scott, University of California; A. Wald, Columbia University; and 
Tjalling Koopmans, Cowles Commission for Research in Economics. 

The meeting closed with the annual business meeting of the Institute, which 
was held at 5 p.m. on Saturday in Haddon Hall. Reports by the President, 
Secretary-Treasurer, and Editor were followed by the election of officers for 
1947: Will Feller, President; Morris H, Hansen and John H. Curtiss, Vice- 
Presidents; and Paul S. Dwyer, Secretary-Treasurer. 

P. S. Dwyer, 
Secretary . 



ON THE ASYMPTOTIC DISTRIBUTION OF DIFFERENTIABLE 
STATISTICAL FUNCTIONS 

By R. v. Misls 
Harvard University 
Table of Contents 

Introduction 

Part I Prelim mat y Theorems 

1 Asymptotically Equal Distributions 

2 Special Class of Statistical Functions Quantics 

3 Asymptotic Expectation of Excess Power Products 

4 Asymptotic Expectation and Variance of Quantics 

5 Final Statement on the Limit of Expectation of Quantics 

6 Theorem on Pioducts of n Functions 
Part II Diffti entiahle Statistical Functions 

1 Definitions 

2 Taylor Development 

3 General Theorem 

4 Illustrations 

Part III Second-Type Asymptotic Distribution 

1 Statement of the Problem 

2 Characteristic Function 

3 Asymptotic Value of Q n (u) 

4 Asymptotic Value of P n (x ) 

5 Transition to the Continuous Case 
References 

Introduction. If n real variables , x 2 , • ■ • , x n are subject to a probability 
distribution with the element diVi{x\)dV 2 (xi) * • * dV n (x n ) one can ask for the 
distribution of any function / of X\ , x>> , • • • x n . We are primarily interested in 
statistical functions , i.e. in functions that depend on the repartition S n (x) of the 
n quantities X \, x «, • • • x n only. The simplest case is that of the linear statis¬ 
tical functions 

(1) / = [ i(x) dS n U) = - [t(rd + *(r 2 ) + • • • + *(x»)]. 

J n 

The so-called Central Limit Theorem of Probability Calculus states that the 
distribution of a linear statistical function, if n tends to infinity, approaches 
more and more the normal (Gauss) distribution if some very general conditions 
linking yp(x) and the V P (x) are fulfilled. It has been shown, ten years ago, [2] 
that the restriction to linear functions here is immaterial. Much more general 


1 The function S n (x) is called the repartition of the real quantities xi, x 9 , • • • , x n if 
nSn(x) is the number of those among the x \, , • • •, x n that are smaller than or equal to x . 

309 


PAGE 

309 

311 

312 
314 
317 
320 

322 

323 
325 
327 
329 

331 

332 
335 
338 
342 
348 



310 


R. V. MISES 


statistical functions tend towards normalcy with increasing n, for example the 
variance of mth order 

(2) / = M m = f (x — a) m dS„(x), a = J xdS n (x ) 

and, likewise, such combinations as the Lexis quotient M 2 /a( 1 — a/N) or Gini’s 

disparity measure 1 — J (1 — S n ) 2 dx/a or, in the multidimensional case, the 

correlation coefficient, etc. On the other hand, statistical functions are known 
whose distributions assume, asymptotically, a form different from the Gaussian. 
One example is Pearson's Chi-square, another the test function w 2 , introduced 
by H. Cramer [1] and the author [4]: 

(3) / = «* = / g'(.x)[S n (x) - ? n (x)fdx 
where g'(x) > 0 and 

(4) ?„(x) = - [F,(x) + 7,(x) + • • • + \\{x)\. 

n 

N. V. Smirnoff [7, 8] computed the asymptotic distribution of o> 2 for the case 
that all V„(x ) and, therefore, V n (x ) equal one and the same distribution func¬ 
tion V(x). The result differs widely from the Gaussian distribution. 

In order to understand all this it is necessary to consider / as a function de¬ 
fined in the space of distributions V(x) (or in a sub-space of it). Then, the vari¬ 
able / whose distribution is sought is the value of f{V(x) ) at the “point" S n (x) 
and should be written as f{S n (x)}. Such “functions of functions" were first 
introduced by Yito Volterra (1887) and are today a familiar topic of higher 
analysis. The first statement that can be made is that the asymptotic dis¬ 
tribution of f{S n (x )( depends mainly on the behavior of f{V(x)) at the point 
V n (x) defined by (4). 

Volterra also introduced the notion of derivatives and of Taylor development 
for a “fonction de ligne." Using these concepts a more specific statement can 
be pronounced: The type of asymptotic distribution of a differentiable statistical 
function f{S n (x)\ depends on which is the first non-vanishing term in the Taylor 
development of f\V(x )} at the point V n (x); if it is the linear term the limiting dis¬ 
tribution is normal , under restrictions that can easily be derived from the Central 
Limit Theorem ; in other cases higher types of asymptotic distributions result. 

The present paper tries to establish this theorem and to furnish preliminary 
information about the asymptotic distribution of the second type. 

If both the function f{V(x)} and the sequence of distributions Vi(x) f V^Or), 
Va(aO, • • • are defined independently of each other, it cannot be presumed that 
the derivative of f vanishes at V n (x). In this sense the normal distribution ap¬ 
pears as the “general case" of an asymptotic distribution while the higher types 
represent certain “singularities." In the case of type m, (m == 1, 2, 3, • • •)> 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


311 


the distribution of the expression 

(5) n n,2 [f{S n (x)) - f {?„(*)}] 

tends towards a function of bounded mean value and variance. For m = 1 
it is a Gauss function with mean value 0 and finite variance. For any uneven 
m the distribution is symmetrical with respect to the zero point. If / is given, 
the limiting distribution is essentially determined if in addition to V n (x) <me func¬ 
tion of two variables , U n (x y y), is known , 


( 6 ) 


Unix, y) = - i: [V,(x) - V,(x)V,(y)\, 
n 

= - E IVAy) - V.(x)V,(y)], 


(x g y) 


(x ^ y)- 


For instance, in the case of the linear function (m = 1) defined in eq. (1), the 
(second order) variance of (5) is found as the Stieltjes integral 


(7) 


J i(x)tiy) dUn(x, y) 


and no mean values of higher order are required for computing the moments of 
any order, whatever m is. 

For m = 2 the complete expression for the characteristic function of the asymp¬ 
totic distribution of (5) is developed in Part III of this paper. It has the form 

(8) D(ui) 

where Z)(A) is in general the Fredholm determinant of a symmetrical kernel that 
depends on the second derivative of f{V(x)} at V = V n , on V n and on U n . 
If the Fv(^) are discontinuous distributions with saltus at k distinct points only, 
D is the determinant of a quadratic form of k variables. This happens to be 
the case with Pearson's x 2 while the to 2 distribution found by Smirnoff represents 
a fairly general case of the asymptotic distribution of second type. 

PART I. PRELIMINARY THEOREMS 

1. Asymptotically equal distributions. Let Ki , K %, K %, • • • be an infinite 
sequence of collectives, k n the number of variables in K n and A n , B n two func¬ 
tions of these variables, (n = 1, 2, 3, • • •)■ The cumulative distribution func¬ 
tions of An and B n will be denoted by P n (x ) and Q n (x) respectively, i.e. 

(1) Pn(x) = Prob {An £ x\ , Qn(x) = Prob {Bn £ x\ 
and the expectation of | A n — B n | by 

(2) En{\An~Bn\} 

all these quantities being taken with respect to the distribution in K n • 



312 


R. V. MISES 


Two functions F n (x ) and G n (x) both depending on the parameter n are said 
to be asymptotically equal if 

(3) lim | F n (x) — G n (x) | =0 uniformly in x. 

n—oo 

If this is the case for the cumulative distribution functions P n (x) and Q n (x) of 
A n and B n we shall also say that A n and B n have the same asymptotic distribu¬ 
tion. Eq. (3) will also be written as F n (x) ~ G n (x). The following can be 
proved: 

Lemma A. If with increasing n the expectation of the absolute difference be¬ 
tween A n and B n tends towards zero and if one of the functions P n (x) or Q n (x) is 
asymptotically equal to a function F n {x) that has a uniformly bounded derivative , 
i.e. 

(4) lim E„[\A n - B n \] = 0, dF f x) < M for all » 

dx 

then A n and B n have the same asymptotic distribution. 

This statement, in a slightly different wording, was proved in an earlier paper 
[2] and the proof will not be repeated here. If one of the various definitions for 
“stochastical convergence” is used, one can also say that A n and B n , under the 
stated conditions, converge stochastically towards each other. 

The Lemma A can be extended and modified in various ways. First, it is 
obvious that the expectation of | A n — B n \ can be replaced by that of any 
positive power | A n — B n | \ With respect to F n one could ask for the existence 
of a bounded derivative in all points except for a zero set only. Then P n and 
Q n would still converge everywhere except for this zero set and the definition 
of asymptotically equal distributions could be extended to this case. In the 
present paper this will not be done as it is not our purpose to strive for results 
of the possibly greatest generality. 

2. Special class of statistical functions: quantics. Preliminary to the study 
of general statistical functions a special class which corresponds to quantics 
(homogeneous polynomials) of rath order must be discussed. Let Fi(.r), F 2 (z), 
Vz(x ), • • • be the cumulative distribution functions in a sequence of one-dimen¬ 
sional collectives Ci , C 2 , C *, • • • and S n (x) the repartition of a sample drawn 
from the n-dimensional collective K n , with the distribution element 

dV 1 (x 1 )dV 2 (x 2 ) —dVnte). 


We introduce 


T»(x) = SJx) - V n (x), ? n (x) = - Z V,(x). 

n 


( 5 ) 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


313 


Here, nT n (x) is obviously the excess of observed values ^ x over their expected 
number. Quantics of first, second, third, • • • order are then defined as 

MS n (x)} = j m dT n (x) 

( 6 ) /*{&(*) 1 = // *(x, y)dT n (x)dT n (y) 

f,[Sn(x)} = III tix, y , z) dT n (x) dT n (y) dT n (z) 


all integrals to be extended over the total range of x . Of course, only such ^ 
for which the respective integral exists are admitted. The first, fi , is obviously 
a linear statistical function and the asymptotic distribution of yjnf\ is, under 
well-known conditions, a Gauss function with the mean value zero and the 
variance given in eq. (7) of the Introduction. In f 2 , f 3 , • * • the yp may be 
supposed to be symmetrical with respect to their variables. It will be seen 
later (Part II, sec. 2) that the first derivative of / 2 , the first and second deriva¬ 
tives of fi , etc. vanish at the point Vnix). 

All the above functions/i, / 2 , fz , • • • can be considered (if the ^ are continu¬ 
ous) as the limits of ordinary quantics in k variables. Choose k disjoint inter¬ 
vals 7i , h , • • • , h on the x-axis, and call h +1 their complement. Denote the 
increment of V v (x) within I K by p VK and the increment of S n (x) by p nK . Ob¬ 
viously p VK is the probability, within C„, of x falling in the interval I k and np nK 
is the number of observed sample values in the same interval. We introduce 
the excess values : 


Pn k Pn 


1 " 

Pn . = " Z 
n v-i 


and form the sums 

k 1 •••& 1—A; 

(8) fl == £« J fi = X £* ) /s ^ *Pk\h 1 •** • 

*— 1 K,X K.X.M 

By selecting suitable sets of intervals 7i, h , • • • , h and appropriate values 
for the constants \f / K , , • • • , one can approximate the integrals ( 6 ) by sums 

of the form ( 8 ). 

Our next task will be to find asymptotic values for the expectation and for the 
moments of the quantities defined in ( 8 ). Clearly a formula for the expectation 
of a power product • • • where «, /3, 7 , • • • are positive integers, is the 

only thing we need. To arrive at such a formula we replace each of the one¬ 
dimensional collectives C v by a /c-dimensional C* in the following way. 

In C* the chance variable is a ^-dimensional vector which can take (fc + 1) 
distinct values only: it can be zero or coincide with the unit vector parallel to 



314 


R. V. MISES 


one of the k axes. To the latter values of the variable we assign the probabilities 
p,i, p> 2 , * • • , Pvk and to the zero the probability 

(9) P,,k+1 = 1 — P*l — Pv2 — ~ Pvk 

This quantity, of course, may vanish. The mean value of C * is the point with 
the coordinates p„i, p v 2 , • • • , p V k . 

If the n collectives 0* , C* , • • • , C* are combined, the sum of the n observed 
vector values is a vector with the components np n 1 , np n2 , • • • , np n k . If in 
each C* the origin is shifted to the mean value and the coordinates with respect 
to the new origin are called z x , z 2 , • • • , z k , the sums of the observed z x , z 2 , • • • , 
Zjt-values will be n£i , n£ 2 , * • • , n% k rather than np n x , np n2 , • • • , np n k . Thus 
it is seen that all questions concerning the distributions of £ 1 , £ 2 , £ 3 , • • • can 
be answered on the basis of the well-known rules on the addition of n independent 
chance variables. This leads to the symbolic formula for the expectation: 

(10) E n {• • • 1 = (t (j2 ■■■ , 

where on the right-hand side each term 

( 11 ) zuz^ztf- 

has to be replaced by 

(110 / ■ • • dV*(z). 

Here, obviously, V*(z) is the distribution function in C* and the expressions 
(11') are in fact sums of (k + 1) terms, for example 

/ Zi « 2 dV*(z) = p,i(l - p,i)(-p, 2 ) + pA~Pyl)(l - p*d 

(12) 

+ 2 P>A-P>i)(-Prt) = ~PnP»i • 

It will be seen in the next section that only very few of these sums are needed 
for computing the asymptotic value of (10). Note that the value of (11') can 
be expressed in terms of p vX , p, 2 , Pvz , * • • alone if £ 1 , £ 2 , £ 3 , • • • only appear in 
the product. 

3. Asymptotic expectation of excess-power products. We first consider the 
case where the sum of exponents a, 0, 7 , • • • is an even number 

(13) ct -I- 4“ t 4” * * * ^ 2wt. 

On the right-hand side of (10) stands a sum of n 2m terms, each a product of 2m 
factors Z VK . It follows from (11') that the absolute value of a product cannot 
surpass 1. The second subscripts are the same in each term: first a ones, then 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


315 


iS twos, 7 threes, etc. The first subscripts are in each term a combination of 
2 m digits out of v = 1, 2 , 3, • • • „ n. The number of those combinations which 
include s different ^-values, (s = 1 , 2 , • • • 2 m), is 

(w) (“)*!-' - (”)[> - (iy. - d-+•••+(.:,) i-]. 

Obviously, the K { , m) are bounded (independent of n). 

If 8 > m the combination of first subscripts must include at least one v-value 
that appears only once. All those products vanish since 

(15) J z K dVy(z K ) = 0 for all k, v 


due to the fact that the origin in the z-space coincides with the mean value of 
the distribution V*(z). Note that 


(16) 



(s < m) 
(s = m). 


It follows that the sum of all terms in (10) that correspond to any s < m are 
of the order o(n m ) or smaller. 

Thus, we arrive at an asymptotic expression for E n by dividing both sides of 
( 10 ) by n m : 

(17) n m E n [ $ • • •} ^ ^ ^ (II % vk ) 


where only such products on the right-hand side are retained which include 
exactly m different v-values each appearing twice . 

In analogy to (12) we compute 

Z t Z,4Vy(z) = -PfuPvK K) 

(18) J 

= P»>i(l Pvi) == *) 


and write, for the sake of abbreviation 

(19) Pi? = p,fi» - p,V" = P« 


with the usual meaning of ( = 0 if i ^ k and = 1 if t = k). Then the sum 

to the right in (17) includes ( 2 rn!)/ 2 m terms, each a product of m factors 
If each of the m couples i, k consists of two different figures, the respective prod¬ 
uct appears a! 0 ! 7 ! • • • times; if r couples are doubles (1 = k) the multiplicity 
of the term is 2~ r a! j3! 7 ! • • • . Therefore, (17) takes the form 


( 20 ) 





316 


R. V. MISES 


In this sum the upper indices are any set of m digits out of 1, 2, 3, • • • , n 
and the subscripts are all sets of m couples including a ones, (3 twos, y threes, 
etc. To each such set of m couples belong Q) terms of the sum. The number 
of sets of couples is bounded (independent of n ). The exponent r is the number 
of doubles (i = k) among the m pairs. 

The expression (20) admits of a transformation which renders it much more 
suitable. Assume that a set of couples i, k has been chosen according to the 
conditions and consider the product 

(21) (j2 ^,)(E P^l ■ • • P^2~ r . 

Among the n m terms which we obtain by developing (21) are all terms appearing 
in the sum (20), each of them repeated m \ times and, in addition, 

(22) n m — (™)ra! — n m — n(n — l)(n — 2) • • • (n — m + 1) 

other products of m factors P. Since the difference (22) divided by n m goes to 
zero with increasing n and each | P | is smaller than 1, the additional terms 
have no importance. We therefore introduce the quantities 

(23) P tK = - E py = 5..-E - - E • 

n v-i n v^i 


Then (20) can be written as 


(24) n m E n {£gZ-, 


l Zrp„p„,... p..... 

ml i, K 


Here we have a sum of a finite number of terms. It will be supposed in all that 
follows that the P lK as defined in (23) do not vanish identically as n increases in - 
definitely . 

Since in the sum (24) no upper indices appear, equal terms repeat themselves. 
We can, therefore, rearrange it, using the polynomial coefficients and absorbing 
at the same time the factor 2 “ r . The final form of (24) is given in the following 
Lemma , which also includes a statement for the case of an uneven sum of 
exponents a + /3 + y + • • * . In fact, it is easily seen that if again half the 
sum is called m, no group of terms on the right-hand side of ( 10 ) exists that 
would supply a finite limit when divided by n m . Thus we arrive at 

Lemma B x . If is the numerical excess of observed over expected quantities 
falling in the interval /«, the asymptotic expectation of the excess-power product 
{“iff? ••• is given by 

(Vn)“ + ^ + Pnliidts if a -j- /3 -h 7 + • • • uneven 


(25) 


a \$ ly ! 


I I 

9 0 'lll 0 ’ 22 i 


C12I 


(*Fn )'**(iP 2J )' 


P 9 12 T5 9 ) 
12 * 13 


if a + ($ + 7 + • • • even 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


317 


the sum to he extended over all sets of non-negative integers an , 0-22 , * • •, an , • • • 
{hat fulfill the conditions 

(25') (Tn = §(« <ri 2 — an — • • •)> <*22 — i(/3 — <T 2 i — <*22 — •••),••• 

The P, K as defined in (23) depend on two groups of mean values only, namely on 

(25") pK = - 53 P« and p,p, = - £ P*. P« • 

v-i n v-i 

Some properties of the matrix P 1(t will be discussed in the next Section. 

For practical computation, instead of (25), a recursion formula may be used 
which follows immediately from (24). Writing simply (a, £, 7 , • • •) for the sum 
in (24) the formula reads 

(26) = “ 2 , ft 7, •••)?»+ - 2 , 7 , •••)Ps 2 + ••• 

+ (« - 1,0 - 1 , 7 , ■■ •)?!*+ («,0 ~ 1 , 7 “ 1 ), ■••)?«+ ■••• 

If all the original distributions F,(x) are equal, this recursion formula, and from 
it (25), can be derived almost immediately from the theorem on the multiplica¬ 
tion of characteristic functions with the addition of chance variables. 

Note that the expectation of the product is P t */ra for any value of n. 


4. Asymptotic expectation and variance of quantics. We first state a char¬ 
acteristic property of the expression (25) for the expectation of an excess power 
product. Let us denote by C a , 0 , y ,... the right-hand side of (25) in the case 
of even a + 0 + 7 + * * * • Then, if .. is expressed in terms of P t « and each 

time the subscript 2 is changed into 1 , we arrive at the value of Ca+ 3 .o. 

This would not be the case if C a ,^ l7l ... were expressed in terms of p t , since e.g. 

Cu = Pn = Pi - pipi , C 12 = P12 = ~ P1P2 • 

In order to prove the statement we observe that the C <*,0,7,... can be derived 
from the coefficients in the development of the rath power of a quadric: 


(27) 




= m! T. 


a 1/3 !y! • • • 


tttUS 


It follows that 


(27') 


Ca,0,y,"' 


j_ a 2 ™ 

ml dt“ diets ••• 




If in the subscripts of P tc the ones and twos are identified, the quadric becomes 
a function of h + k , U , U , • • • and the derivative with respect to d t* dfj equals 
the derivative with respect to d t? +fi . On the other hand, the latter derivative 
corresponds to the value of C a +p, 0 , 7 ,-•• in the form (27'). 

Taking m = 2 , a = /3 = 7 = 3 = 1 , eq. (25) supplies 

( 28 ) n 2 E n [i ~ P»«P*m + P*xP«m + Pi*P«x •.. 



318 


R. V. MISES 


According to the above statement this is correct whether t, k, X, /x are or are not 
different from each other. Thus, if is a symmetric set of constants, we 
have 

(28') n 2 E n { 53 {»{*fx&} ^ 3 2 ^\«x M Pi«Px M * 

In general, the numerical factor to the right, i.e. the number of sets of couples 
drawn from 2m figures, is (2m)!/2 m m! = 1*3 ••• (2m — 1). Thus we can 
state: 

Lemma B 2 . If a quantic f 2m is defined according to (8) with symmetric coejfi - 
dents, its asymptotic expectation is given by 

(29) n m E„{f im } ~ 1.3.5 • • • (2 m- 1)£ .• • • P., m - t „ m . 

Before applying this to the continuous case defined in (6), let us consider some 
characteristic properties of the matrix P lK . According to the definition (19) 
of P t ( ? we have 

(30) xf pi? /. =z p„. - (t p„ /.y 

and using (9) one easily derives from Schwarz’ inequality 
^Pnti ^ . 

Since P t < is the arithmetical mean of the P t ( ? it follows that the matrix P 4 * 
is at least semi-definite and is positive definite except when all p*,k+ 1 = 0* 
In the latter case (if e.g. the k intervals cover the whole £-axis) one has 

I-- * i n r k / k \2-j 1 r 

(31) £ r.« = - £ £ p« - (£ p„ ) = - £ - p».*+i) = o 

which shows that here the reciprocal matrix P* does not exist. 

In the “complete” case, that is, with all p,,k+i = 0, the elements in each 
horizontal or vertical line of the matrix P t « have the sum zero. It follows that 
the k homogenous equations 2P t «z t = 0 have the solution Xi = x 2 = • • • = Xh 
and, therefore, that the cofactors of all elements of P t * have one and the same 
value. For each single v the determinant of Pi? can be computed: 

| Pik I s5 Pit lP*2 * * * PwkP»,k +1 

If this is applied to the principal minors of the same determinant in the case 
p>,k+i ~ 0, one finds the characteristic equation of the matrix Pi? to be 

I- XPl? | ^ [d - Xp,i)(1 - \p,i) ••• (1 - Xp,*)]. 

This shows that ( k — 1) characteristic roots separate the abscissas l/p,i, 
l/p, 2 , • • • , 1/p,* (one root being zero). 

The number k of intervals has nothing to do with the preceding argument 
leading to the eqs. (25) to (28). Also can the entire computation be repeated 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


319 


in terms of dT n (x 0, dTnfe), dT n {x%) , ••• instead of fi, f 2 , fi, • • • if appro¬ 
priate differentials are substituted for the . To find the latter ones we note 
that pn stands for the increment dV v {x). Thus, using 8(x, y ) in analogy to 
(= 1 for x = y and = 0 for x ^ y) we set 

(32) dVV) " 8(X ’ V) d7fW “ dV ' (x) dVf(y) 

= 6(x, y) dF,(x) - dT7,(x, y) 

which is equivalent to the definition of a function of 2 variables: 

(33) u ' (x ’ y) " F ' (a0 “ F ' (x)7 ' (2/) = F '^ “ ^ (* * 0 

= F,(y) - V,{x)V,(y) = 7,(y) - W,(x y y) (x £ y). 

Then P t « has to be replaced by 

(34) dV n (x , y) = - Z) dt/„(x, y) = 5(x,y) d7 n (x) - dW n (x, y). 

n *.i 

This dU n (x , y) is the expectation of dT n (x) dT n {y)/n. 

The function 


(35) 


Unix, y) =~£ Unix, y) 
71 o.i 


is the difference of two cumulative distribution functions, one corresponding to 
a distribution along the straight line x = y with the element dV n (x) and an¬ 
other distribution over the whole plane with the element 

(35') dWnix, y) =-t, dV,(x) dV,(y). 

71 o-l 

To each one-dimensional distribution 7„(x) belongs one “distributioTi excess” 
U,(x, y) as defined in (33). The T[ V k are the increments of U v (x y y) within 
the product interval dxdy. It is seen from the preceding argument that the 
asymptotic moments of any quantic (6) or (8) depend only on the average V n 
of the distribution excesses U v . 

If a quantic is defined by (6) and the integrals on both sides exist, the asymp¬ 
totic expectation of / 2m may be written in formal analogy to (29) as 

n m E n {f2m} ~ 1.3.5 • • • (2 m — 1) JJ ••• J f(x i , Xa , • • • , Xam) 

(36) 

X dUn(x l , X2 ) dUn{,X$ , X4) • * • dTJnfatm -1 9X2 m)» 

This formula is identical with (29) if \p has constant values in a finite number 
of intervals and vanishes outside these intervals. But it will be seen in the next 
section that (36) can be used in more general cases also. 

For the sake of practical computation one may develop the righthand side 



320 


R. V. MISES 


of (36) into terms explicitly depending on the given averages 'Tn(x) and W n (x, y). 
For example, in the case m = 3: 

n*E n {ft} ~ 1.3.5 JJJ [\p(xi y x ly x 2y x 2y x 8y x 8 ) dV n (x i) d? n (x 2 ) dV n (x 8 ) 

^ - 3t(xi , XI , Xi y X 2y X 8y Xi) dVr(x i) dfnfa) dWn(x3 , X 8 ) 

+ 2>yp(xi y x ly x 2y x 8y x iy x b ) dV n (x i) dW n {x 2 , x 8 ) dWn(Xi , x 5 ) 

- \p(x 1 , z 2 , x 3 , x 4 , z 6 , x b ) dTPnfo, x 2 ) dTPnfe* , z 4 ) dfPnfe, a*)] 

In the general case, the numerical factors in the ra-tuple integral are the binomial 
coefficients of order m. 

The higher moments of quantics f m can be computed in the same way as 
En{fm\ since any power of f m is a quantic again. The formulas, however, be¬ 
come more involved since the coefficients of /m are not immediately given in a 
symmetric form. It will suffice to show here how the (second order) variance 
of f 2 can be found. The second moment is the expectation of 

(39) /* = //// yw*> M ) dT *w dT »w dT *w dT »w- 

Applying here eq. (28) we have 

riE n {j\) ~ JJ y)\p(z, u)[dU n (x, y ) dU n (z, u) 

(40) _ 

+ dUn(x, z) dU n {y, u ) + dU„(x, u ) dU„(y, «)]. 

The first term in the brackets leads to the square of n E n {f 2 } while the second 
and third terms, due to the symmetry of ^(z, y ), supply two equal integrals. 
Thus 

Var \nf 2 ) ~ 2 JJ f(x, y)f(z y u) dU n (x y z ) dU n (y y u) * 

(41) 2 j JJ t(x y x))p(y,y) dV n (x) d? n (y) - 2 JJ f(x,y)f(y,z) dV n (y) dW n (x,z) 

+ JJ fix, y)^y y) dWn(x y z) dWn(y , u) J. 

In the same way moments and variances of any order can be computed for any 
quantic f m . 

5. Final statement on the limit of expectation of quantics. We shall prove 
the following: 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


321 


Lemma Bj . Given a sequence of distributions Vfx), F 2 (x), Vt(x) , • • • and a 
quantic of order 2m 

* * * f > *^2 > * » ^2m) dT n (x i) dTnfXlt) * * * dTf,(X2m) 

assume that there exist a continuous function ^(x) and a distribution V(x) such that 

, I ^(*1 , *2 , • • • X*m) I ^ * fe) • • • *(x 2m ) 

(42) 

dV,(x ) ^ rf7(x) for | a; | > X, v = 1, 2, 3, • • • 
and that the integrals 

(42') / *'(*) d7(x), (r = 1,2, • • • 2m), 

have finite values. Then , /or any 8 > 0 

(43) lim = 0. 

n«a oo 

This lemma, on which the main theorem of Part II is based, will be estab¬ 
lished if it is shown that the formula (36) holds true for functions ^ satisfying 
the conditions (42). 

In the transition from the complete expression (10) for the expectation E n 
to the asymptotic value (25) two essential steps were made. First, certain 
products of the form (11) have been omitted and, second, certain products 
of as defined in (19) have been arbitrarily added. This was allowed be¬ 
cause each of the products was seen to be smaller than 1 and their number was 
of the order If a quantic in integral form (6) is considered which 

involves an infinite number of expressions like (10), a sharper estimate is 
necessary. 

It is easily seen that each integral (11') is a polynomial in p VK including the 
product p„jp, 2 p »-3 • • * and another factor which is certainly bounded whatever 
the p,« are. Thus, if the expectation of £i£ 2 • • • fcm is computed, each term of the 
form (11') consists of a finite factor and the product p„ip„ 2 • • • p„ l2 m . In passing 
to the expectation of the quantic, the p„< have to be replaced by dV v (x K ) and 
each neglected term in (10) leads to an expression like 

(45) J*J • • • J ^(#i» %2 f * * * } m) dV Pl (x i) dV ? 2 (# 2 ) * * * dV PK (x^). 

According to the assumptions of B 3 this integral has a finite value. The num¬ 
ber of neglected terms being of the order 0(n m ” 1 ) the omission of these terms is 
justified. 

On the other hand, products of equal, except for the sign, products 
of p, t p„ as long as 1 7 * k and, except for a finite factor, products of p , 4 as often 
as 1 = k. Again it is seen that the arbitrarily added terms sum up to integrals 



322 


K. V. MISES 


of the fonn (45). This shows that here too, if the conditions of B$ are fulfilled, 
the procedure leading to (25) may be applied. 

It follows that, under the conditions (42), if the integral (42') has a finite 
value, eq. (36) is correct and (43) is an immediate consequence of it. On the 
other hand, it is obvious that weaker conditions than those given in B 3 would 
suffice to establish (43). 


6. Theorem on products of n functions. The principal source of all explicit 
formulas on asymptotic distributions lies in certain properties of products of a 
great number of factors. Laplace devoted a part of his fundamental Treatise 
of Probability to these problems, but a complete outline of all results from a 
modem point of view is still lacking. In the third part of the present paper, a 
rather simple statement on this line will be used which may be formulated here as 
Lemma C. Let F v (zi , z *, • • • , z k ), (v = 1, 2, 3 , • • •)> be a sequence of analytic 
functions of k complex variables and G n the product FiF 2 * • • F n . Suppose that 
at the point z\ = z 2 — • • • z* = 0 all F, have the value 1, vanishing first derivatives , 
and the second derivatives 



uniformly in each bounded region | z L | ^ Z in which the absolute values of the third 
derivatives of all F„ have an upper bound M. 

In fact, the Taylor development of F v supplies under the conditions stated: 

(48) F,(z t , z,, •••,**) = 1 + iE Ai^z. + 0(Z*) 


and, therefore, 

(480 log F,{z x ,zt,-- - , z h ) = $ 2 A I’Jz, z, + O(Z’). 

«.« 

If here all z» are replaced by z % /y/n and the equations added for v = 1,2 , • • • ,n 
we obtain 


(49) 



Jl Jsl Jl\ - — V 
Vn , Vn , ‘“ Wn) ~2 


A[ v K ) z i z K + nO 



and this shows that the brackets on the left-hand side of (47) are 0(Z/\/n ).— 
It is obvious that (47) would still hold if the condition concerning the third 
derivatives is replaced by a somewhat weaker one. 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


323 


PART II. DIFFERENTIABLE STATISTICAL FUNCTIONS 

1* Definitions. We consider a one-dimensional cumulative distribution func¬ 
tion V(x) as a point in the 7-space. If two points V\(x) and V%{x) are given 
the functions 

(1) V 1 (x) + t[V 2 (x) - 7x(x)], 0 ^ 1 

represent the straight segment between V x {x) and V 2 (x) . A subset of the 7-space 
that includes all segments determined by its elements is called a convex domain. 

Now, assume that a sequence of collectives with the distributions 7i(a0, 
V 2 (x), V 2 (x) , • • • be given. We shall consider functions f{V(x)} defined in a 
convex domain that includes particularly: (1) all average distributions V n (x) 

(2) = - i V,(x) 

n v-i 

at least from a certain n on; (2) all repartitions S n (x) that can occur, i.e. the 
repartitions of n quantities that belong to the label sets of the given collectives 
(e.g. positive x y etc.). If 7°(x) and V(x ) are any two points of the domain, the 
quantity 

(3) F(t) = f{V°(x) + t[V(x) - 7°(x)]}, 0 :§ t* 1 

is a function of the real variable t. It will be supposed to admit derivatives 
with respect to t up to the order r + 1. 

Following Volterra [9, 10] we define (in a slightly modified way) the derivative 
/' of a statistical function / in analogy to the set of partial derivatives of a func¬ 
tion of several variables. If V(x) would stand for a set of distinct variables 
7i , 72 ,7a, • • • and V°(x) for their initial values 7? , V 2 ,7j , • • ■ one would 
have 


( inv°(x) + t[v(x) - vm+* = zM-(Vy- o 

(it y O r y 

where df/d V v is the partial derivative of / with respect to V v taken at the point 
V v = V° v . Thus we write 

(4) d dt f{v ° (x) + <ty(;c) ~ y0(;r)]1< - 0 = / y^ v - y0 )(v) 

and call /' which depends on V°(x) and on a scalar variable y, but not on 7(x), 
the (first) derivative of / {7 (x)} at the point V°(x). Only if a relation (4) is 
fulfilled for any two points of the convex domain, / is called a (one time) differen¬ 
tiable function. 

The derivative of a linear function 

A = J a(x) dV(x) f B * / p(x) dV(x), 


(5) 



324 


R. V. MISES 


is simply the factor a(y), fi(y) ••• respectively, independent of the point at 
which the derivative is taken. If / is given as a function of A, J3, • • • one has 

(6) /'( V(x), y\ = «<y)|+ ^0(y) + .... 

The derivative of the non-linear function 

(7) / = // Kx, y) dV(x) dV(y ) 
is 

(8) f'{V°(x), y\ = / [*(x, y) + t(y, *)] d7*(*). 

Note that an additive constant in f (i.e. a quantity independent of y) has no 
significance since the integral of d(V — V°) vanishes. It follows from (6) 
that the first derivative of the rath order variance as defined in (2) of the Intro¬ 
duction, at the point V°(x) is 

(9) (y - a,)” -myf(x - Oo)”" 1 dV°(x) 
where Oo is the mean value of V°(x). 

In the same way derivatives of higher order can be introduced. The second 
derivative of /( V(x )} is a function of V°(x), i.e. of the point at which the deriva¬ 
tive is taken, and of two scalar variables y , z which correspond to the two sub¬ 
scripts in the case of a function of distinct variables. The definition of 
f’{V(x), y, z] is given in the equation 

J/i v\x) + t[v{x) - n*)]},-o 

f"[V\x), y, zj d(V - V°)(y) d(V-V°)(z). 

The second derivative of a linear function is zero. The function (7) has the 
second derivative \fr(z, y) + fiy, z) independently of F°(a:). The rath order 
variance gives, twice differentiated 

(11) —2mz(y - oo)" -1 + m(m - 1 )yz f (x - a 0 )" -2 dF°(x). 

The variables y and z in /" or in any additive term of /" may be interchanged 
and a term depending on one of them may be added or omitted. Thus, 
can always be written as a symmetric function of y, z without linear terms 
Accordingly, the second derivative of (7) is also 2^(y, z). 




DIFFERENTIABLE STATISTICAL FUNCTIONS 


325 


The derivative of rth order of / at the point F®(x) will be defined by the 
equation 


J/{F # (s) + t[V(x) - V\x)]} M 

( 12 ) dt 

= //•••// <r) f F'(x), y,, 2 / 2 , • • • , 2 , r \ d(V - v*){ yi ) .• • d{V - V°)(y r ). 


Here, for given F°(x), / <r) may be supposed to be a symmetric function of the r 
variables yi , y 2 , • • • , y r . The rth derivative of the mth order variance is 


(13) 


(-l) r m! 

(m — r + 1)! 


2/ i */2 y r 


X [(m - r + 1) f (x - ao) m ~ r dV°(x) - £ ~ 

In the case r = m the expression becomes independent of V°(x), viz. 


(13') 


(-1 ) m m\y x y 2 • • • y m ( 1 - m) 


where terms depending on less than r of the variables yi , y 2 , • • •, y r have been 
omitted. 

If the definitions (4), (10), (12) are confronted one can see that/"{F, z\ 
is the first derivative of /'{F, y } etc. For proofs see [9] and [10]. 


2. Taylor development. The function F(t) defined in (3) admits the develop¬ 
ment 

(14) F( 1) - F(<0) = F'(0) + 1f"( 0) + ••• +^F W (0) + (T^piyiF (r+I) W 

where d is some quantity between zero and one. According to (3) the left-hand 
side equals the difference/} F(x)) —f{V°(x)j. The expressions F'(0), F"(0), • • •, 
F (r) (0) are the derivatives as defined in eqs. (4), (10), (12). In the last term 
to the right, one has to introduce the distribution 

(15) . V'(z) = V°(z) + 0[7(*) - F°(x)] 

and then to take the (r + l)st derivative of / at the point V'(x). 

For a given V°(x) each one of the terms on the right-hand side of (14) is a 
function of V(x). Except for the last one—in which depends in a certain way 
on F(x)—they are quantics with respect to V(x) — F°(x), of the same kind as 
those considered in Part I. (There we had S n instead of F and P n instead 
of V*). 



326 


R. V. MISES 


The rth term of (14) can be written as 

(16) F, = I // •" ••• d(V-V*)M 

where according to (12) 

(16') ,%,•••, Xr) = / (r, {F°(a;), Xi , xj, • • • , x,}. 

To find the characteristic properties of F r we compute its derivatives at a point 
Vi(x). To do this we must replace in (16) the V(x) by 

Vi{x) + t[V(x) - V 1 (x)] 

then differentiate the product 

(17) f[ d[(v ! - y°)fe) + t(v - vd (x,)] 

Kmml 

with respect to t , and finally set t = 0. The derivative consists of r terms 
the first of which will be 

d(V - FiXxOlUOO - 

k-2 

Due to the fact that yp may be supposed as a symmetric function, all r terms 
supply the same integral. Thus the derivative of F r with respect to t at the point 
t = 0 can be written as 

(TzrjyJf •" /*(*> ,**>-‘,*r)d(v - FOWEt d(v,~ v*)(x,). 

Comparing this with the formula (4) which defines the first derivative of a 
statistical function and writing y instead of x and F(x) instead of Fi(x), we find 

F' r \V{x),y} = 

(18) 

(T^iyi // " • / * {y ' Xl .*,.•••»*) «*(7 - F°)(x 2 ) • • • d{V - F°)(x r ). 

This is the first derivative of F r {V{x)) at the point V(x). It vanishes at the 
point V(x) = V°(x). 

The integral in (18) has the same form as that in (14) except that its multi¬ 
plicity is (r — 1) rather than r. Thus it is immediately seen how the higher 
derivatives of F r can be found. For the second derivative F f r '{V(x) t y, z] 
we have simply to replace (r — 1)! in (18) by (r — 2)!, then x* by z and finally 
to omit in the product the differential d(V — 7°) (x 2 ). This procedure can be 
continued up to the derivative of order (r — 1). The rth derivative, finally, 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


327 


will be 

(19) Fj r) {F(z), 2 / 1 , 2/2 , • • • , y T ) = ^( 2/1 f 2/2 , • • • , y r ) 

independent of V(x) and, according to (16'), equal to the rth derivative of 
f{V(x )} at the point 7°(x). It is also seen that all integrals of the form(16) 
or (18) vanish if 7(x) equals 7°(x). The results can be summarized as follows: 
The sth term, (a = 1, 2 , • • • r), of the development (14) is a function of V(x) 
for which all derivatives at the point 7°(x) except that of order a vanish while 
this one equals the ath derivative of the original function f{V(x)\ at V°(x). 
The complete analogy of (14) with the Taylor development of a function of 
distinct variables is thus evident. 

If we assume that /{7(x)} is a function whose first (r — 1) derivatives vanish 
at the point V°(x), eq. (14) takes the form 

V(x) - V\x) =lff ••• ff M l V°(x), 2 , 1 , 2 ft, ••• , 2/r) 


■d(V- F°)(2ft) ••• d(V- V°)(y r ) 

( 20 ) 

+ (7TTyi//"■ ,!(«•, 1 

■d(V - F°)( 2 / 1 ) ■■■ d(V - F°) (y T +i)• 


By applying to this formula the lemmas A and B of Part I, we shall arrive at 
the general theorem on asymptotic distributions that is the principal goal of 
this paper. 


3. General theorem. The main result to be derived in the general theory of 
asymptotic distributions is that the so-called normal distribution represents 
the first element in an infinite sequence which includes the asymptotic dis¬ 
tributions of all differentiable statistical functions, except certain irregular 
cases. The Gauss distribution covers in fact only those functions whose Taylor 
development starts with the first (linear) term, in particular the linear statistical 
functions themselves. If the first (r — 1) terms in the development vanish, 
the asymptotic distribution of type r becomes valid. 

Theorem I: Let Fi(x), 7 2 (x), Vz(x), • • • be an infinite sequence of distributions 
and f{V(x)} a statistical function with derivatives up to order (r+1). Denote by 
S n (x) the repartition of the n label values in the collective with the distribution element 
dV\(x) ) dV*(x) • • • dV n (x) and by V n (x) the arithmetical mean of 7i(x), 
Vi(x) , • • • , 7»(x). If for large n the first (r — 1) derivatives of /{7(x)} at the 
point V n (x) vanish and the rth derivative equals faiyi , 2/2 , • • • , Vr), then the 
distribution of 

( 21 ) 


= n r,2 [/{&,(*)} -/{?„(*)}] 



328 


R. V. MISES 


is asymptotically equal to the distribution of the rth order quantic 

( 22 ) Bn /iMft.*.--.*) 

•</(&. - Vn)(Xi)d(S n ~ ?„)(**) ••• d(S n - V„)(X T ) 
under the following conditions : 

a) The distribution of (22) has a uniformly bounded derivative for all n; 

b) Within a convex domain in the V-space that includes all V n {x) from a certain 
n on, and all S n (x) that can occur, the (r + l)s£ derivative of f{ V{x )} is smaller 
in absolute value than a product 't'iy^iyf) • • • 4>(2/r+ 1 ) whereby the 

integrals J [^(a;)]* dV v (x) for k = 1, 2, • • • , 2(r + 1) have a finite upper 

bound for v — 1 , 2, 3 , 

In order to prove this we introduce in eq. (20) S„(x ) for V(x) and V n (x) for 
V°(:r), and multiply both sides by n r/2 . Using the notations (21) and (2) and 
writing T n for ( S n — V n ), the equation reads 

A n - B n = 

(32) 

(r+ 1)! 

According to Lemma A the theorem will be verified if we can show that the 
expectation of the absolute value of the right-hand expression in (23) tends to 
zero. 

According to the Schwarz inequality one has, for any real C: 

( 24 ) E n {\c\] ^ VKm. 

For fixed values of F„ and S n the integral on the right-hand side of (23) is a 
quantic of order (r + 1 ) with the coefficients ipr+i(yi , 2 / 2 , • • • , 2 /hi)* The 
square of this integral is a quantic of order 2 (r + 1 ) whose coefficients are a finite 
number (depending only on r) of terms each of which is a product of two ^ r +i- 
values implying 2 (r + 1 ) variables yi, y 2 , • • • , 2/2 (r+i) . The absolute value of 
these coefficients is, therefore, according to the condition b) smaller than a 
finite factor times the product ^( 2 / 1 ) ^{yf} • • • ^(y^r+i)) and thus fulfills the 
condition of lemma B 3 . If the right-hand side of (23) is identified with C, the 
expectation of C 2 is, except for a finite factor, the product of n r times the expectation 
of the above-mentioned quantic of order 2(r 1 ). It then follows from lemma 

B« that the limit of E n [C 2 \ is zero and from (24): 


lim E n {| Cn |} = lim E n {\A n - B n |} = 0. 


This accomplishes the proof of Theorem I. 

If we apply here what was shown in Part I about the asymptotic distribution 
of a quantic, we can also state the following. 



DIFFERENTIABLE STATISTICAL FUNCTIONS 329 

Theorem II : Under the conditions of Theorem /, the asymptotic distribution of a 
differentiable statistical function f{S n (x )} is essentially determined by 
the average distribution V n [x ); 

the first non-vanishing derivative of f{V(x)} at the point (x); 
the average distribution excess 

Vn(x, y) = Vn(x) — - 2 V,(x)V 9 (y), x g y 

n v-i 

= V«{y) - - E V,{x)V,{y), x^y. 

n k-i 

By “essentially determined” is meant determined except for an additional 
function whose moments of any order are zero. The statement then follows 
from Theorem I in connection with the fact that the asymptotic moments of 
quantics have been computed in Part I from the values of U n (x , y). 

That functions with all moments vanishing exist has been known for a long 
time. A simple example given by Shohat and Tamarkin [6] is the following. 
Let k be a positive constant smaller than f, and u = x K , k = tan ktt. Then, 
the density (positive or negative) 

(26) (p(x ) = e~ u sin ( ku) — Im e~ u(l ~ kt) 

fulfills the condition. In fact, the nth moment of (26) is the (vanishing) imagi¬ 
nary part of the integral 

(27) - r „-«-*•> du = (cos /ar) (n+,M r . 

Since <p(x) takes negative values of the amount e~ u it can be superimposed to a 
given distribution density only in cases where the original density remains 
greater than some multiple of e~ u = exp (—x k ). It can be shown that the moment 
problem is determinate (i.e. the distribution determined by the moments in a 
unique way) if the density vanishes at infinity at a sufficiently strong degree. 

From the standpoint of statistical theory two distributions with the same 
moments throughout may be considered as equivalent. This justifies the ter¬ 
minology used in Theorem II. On the other hand, Theorem I is independent of 
this restriction: The asymptotic distribution of the statistical function /{$ n 0*0} 
is under the given conditions identical with that of the corresponding quantic 
of mth order. A detailed discussion of the case m = 2 will be given in Part III. 
Here follow some illustrations for the general case. 

4. Illustrations. The existence of asymptotic distributions of higher types 
can be exemplified in a comparatively simple way if we start from any known 
asymptotic distribution of a statistical function. 

Let us assume that g{V(x)\ is a function fulfilling the condition 

( 28 ) ?{?»(*)) = 0 


a) 

b) 

c) 


(25) 



330 


R. V. MI SB 8 


for all n, and that the asymptotic c.d.f. for g{S n (x)} is known. There will be 
some positive integer r such that 

(29) Prob [0{£ n (s)} £ zrf r/2 ] ~<J> n (z). 

If, for instance, g is a linear statistical function r will be 1 and, under well- 
known conditions, ^> n (x) a normal (Gaussian) c.d.f. with finite variance depend¬ 
ing on n. 

Now, let / be an ordinary function of g and thus another statistical function 
which may be denoted by f[V(x)\. According to the rules of differentiation 
we have 


(30) f'\V{x),y\ = % g'[V(x),y\ 

dg 

and analogous relations can be derived for the derivatives of higher order. In 
particular, the following statement, valid in ordinary differential calculus, holds 
true: If g{ V(x)\ has derivatives of every order and if the first s derivatives of f 
with respect to g vanish at some point g = g{V\{x)} then also the 8 first deriva¬ 
tives of / with respect to V(x) will be zero at V{x) = Fi(s). In this way we can 
devise statistical functions, with vanishing derivatives, for which the asymptotic 
distribution is known. 

For the sake of simplicity we may assume that (29) holds with r = 1 and 
that fig) is a monotonic increasing function, given in the form 

(31) fig ) = g'{ 1 + aig)] 
with s a positive integer, and the inverse function 

(310 gif) =/ v *[i +m 

where 0(f) goes to zero with / —> 0. Then, from (29): 

(32; Prob [/{£ n (:r)} g ztT (,/2) ] ~ $„(z') 

if z and z are connected by 

rfV = g{n W) z) = nV'[l + /S(n" (,/2) z)]. 

It follows that 

z' - z Vt ~ 0 


and if $»(z') is supposed to be continuous, (32) becomes 
(33) Prob [/!S„(z)) £ m~ Mt) ] ~ <M* 1 "). 


This is a distribution of type s. 

Take as an example for g the arithmetical mean 


g\S n (x)\ = 


X\ + X% + • * * + Xn 


- a n 


(34) 


n 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


331 


where Xi , x %, • • • , x n are the observed values and d n is the arithmetical mean 
of the mean values of V,(x). Then, under certain restrictions for the V,(x ), 
there exists a bounded sequence h\ so that 

Prob[\/n 0 ^ z] ~ $ n (z) = “p £ du. 

Now if we choose 

/ = 6(0 -sin g ) = g* ^1 - L + .. 
the asymptotic distribution of / will be given by 

Prob [ny/nf ^ z] ~$> n (\/z) = “p £ e~ h * u * du 
with the probability density 

2 -< 2 ' 3 > 

3\/ir 2 6 

Similar examples can be drawn from the asymptotic distribution of n\ if one 
asks for the distribution of appropriate functions of nx> etc. 

PART III. SECOND-TYPE ASYMPTOTIC DISTRIBUTION 

1. Statement of the problem. We now propose to study the asymptotic 
distribution of a quantic of second order as defined in eq. (6) of Part I. It 
has been shown in Part II that this covers the case of any statistical function 
of which the first but not the second derivative at the critical point vanishes. 

Independently of what was said before, the problem can be stated in the fol¬ 
lowing way. Given a function \p(x, y ) and a sequence of cumulative distribu¬ 
tion functions V\(x), F 2 (z), Fs(z) • • •- Let V n (x) be the arithmetical mean of 
Vi{x)j Vtix) , • • • , Fn(s) and S n {x) the repartition of a sample z \, z *, • • • , z n 
drawn from the collective with the distribution element dVi(zi) dVt{z^) , • • • , 
dV n (Zn), that is: nS n (x) is the number of those of the observed values 
Zi , , • • • , z n that are smaller than or equal to x . Then the quantity 

(1) / = £/ i/(x, y) dT n (x) dT n (y), where T n {x) = S n (x) - V n (x) 

is determined by the observations z \, z 2 , • • • , z n . We ask for the distribution 
of / at large values of n. 

Without loss of generality, the function \p(x, y) can be supposed to be sym¬ 
metrical. If, in particular, \p(x, y) = \p(x)\p(y), the quantity / becomes the 
square of 

J dTn(x) = £ g [*(«,) ~ f *{z) 


( 2 ) 



332 


R. V. MISES 


and its asymptotic distribution can be computed in the manner shown in the last 
section of Part I. Another example would be 

(x ^ y ) 
(x £ y). 

In this case, integration by parts shows that 

(4) /{&(*)} = / g'(x)Tl(x) dx 

where g ' is the derivative of g. This is the statistical function that takes the 
place of \ in continuous problems. See Introduction eq. (3). 

Note that the “excess” T n {x) vanishes at x = db °o and that for sufficiently 
large x the increment dT n (x) equals — dV n (x). Thus, conditions for the exist¬ 
ence of the integrals in (1), (2), (4), etc. can be expressed in terms of the given 
functions x , y) and V v (x). 

We shall first study the special case that implies so-called discontinuous chance 
variables. In our terminology it is the function y ) that has to be specified. 

Let /i, / 2 , • • • , Ik be k mutually exclusive one-dimensional intervals (or groups 
of intervals) and /*+1 their complement. Assume that ^(, x , y) has a constant 
value when x falls in I t and y falls in /«, (i, k = 1,2 , •• • , k + 1) . The increments 
of £„0r), V n {x), T n (x) in the interval /« will be called p K , p K , £< respectively. 
Clearly, np K is the number of observed values falling in I K , np K is the expected 
number of such values, and n(p K — p K ) = n% K the excess of observed over expected 
numbers. Note that the given distributions V y (x) determine increments *p VK 
in the interval I K and that 

(5) p K = “ (Pl* + P2« + * * * + Pn«). 

n % 

Since the sum of all £« must be zero we can replace £*+* b > 

(6) &+1 = —£i — £2 — ••• —£* • 

Thus, the integral (1) can now be written as a sum of k 2 terms 

(7) /{&.(*)} = 

l.K 

like that introduced in the second eq. (8) of Part I. 

Our next task will be to find the asymptotic distribution of (7) which depends 
on the matrix (t, k = 1, 2, • * • , &), and on the succession of probability 
values p„ , {v = 1, 2, 3 , • • • ; k = 1, 2 , • • • &). The matrix \p lK in h variables 
will be supposed to be symmetrical. 

2. Characteristic function. We define our chance variable as 

( 8 ) 


(3) 


y ) = gw 
= g(y) 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


333 


All summations, here and in what follows, are to be extended from 1 to k if 
not otherwise indicated. If P n (x ) is the c.d.f. of x , that is 

(9) Probj|/g = P n (x) 
the characteristic function (c.f.) is defined by 

(10) Qn(u) = E{e u '\ = j e™dP n {x). 

In order to compute Q n we assume that the quadratic form (8) is transformed, 
by a linear transformation, into a sum of squares. Using appropriate (in general 
complex) coefficients a iK one can write 

(11) * = \ Of! + i?i + * • * + vl), »7i = X *£* • 

(The form \[/ lK is here supposed to be non-singular which, however, means no 
loss of generality). It will be seen later that explicit knowledge of the a iK 
is not needed. 

Now, for any real or complex y y the identity holds: 

da - jr, / c ‘" + ’' dL 

if we write v for y/ ui and replace in (12) successively y by vy/ n m , vy/ n m , 
we find 

(13) e xu ' = (2 w)~ m jj • j exp [-*£ <« + vVnZzJi J dhdk ■ ■ ■ dt k 

where 

(14) z K rj K t Kf z K — ^ <x lK t t , (#c = 1 , 2, • • • , Aj). 

& 

Since the first exponential factor in the integrand is a constant with respect 
to the chance variable, the expected value of e xut is given by 

(15) Q„( u) = £(0 = (27r)- W2 Jf ■■■ f ex p [-hT, tl] G„ dt, dk ■ • • dt k 
with 

(16) G n = E{ exp [»\/»£««<£«]}• 

In order to find G n we consider the following n collectives Ci, C 2 , • • • , C n 
with discontinuous, (k + 1)-valued distributions: In C r the label values are 
*i i 2s, " - , 2*, and z k +i , with z k +1 = 0, their probabilities p„i, p, 2 , • • • , p,,fc+i • 
The c.f. of this distribution at the point —iv/y/n is 



334 


R. V. MISES 


If we multiply the n expressions (17) for v = 1, 2 , • • • n the product will be— 
according to well-known rules of probability calculus—the c.f. for the distribu¬ 
tion of the sum of the n label components in the collective formed by combining 
Ci, C 2 , • • • , C n . This sum is 

J^np K z t 

and therefore, 

as) E {exp Z = n [2 

Multiplying both sides of this equation by 

(19) exp [ - ^ Sw * A ] = exp ["S 

and using the abbreviation 

(20) z v — y i p vK z K 

K 

we arrive at 

(21) G n = E{ exp [v\/n Z $«z«]} = F X F 2 ••• F n 
with 

fc+i # 

(22) F, = £p, I e' M ' 1/ v' ; . 

«_1 

This solves the problem: By inserting (21), (22) in (15) and carrying out the 
integration with respect to h , U , • • • , t k one has expressed Q n (u) in terms 
of the given p, K and of the coefficients a lK which link the z K to the t K . This ex¬ 
pression for Q n (u) holds for all n. 

We have still to show that the integral (15) exists, at least for small | u | or 
| a |, independently of the value of n. For this purpose we develop F v , as given 
in (22), in the neighborhood of v = 0. At this point F v = 1 and the first deriva¬ 
tive vanishes by virtue of (20). We thus have 

2 m-i 

(23) F, = 1 + fZ V,.{z. ~ z,Y 

k-i 

with | ^ 1. From the definition of z c in (14) it follows that the ratio | z, \/T 

with 

T 2 =* t\ + A + • • • + tl 

has an upper bound depending on the a iK only. On the other hand, according 
to (20), z, is a weighted mean of the z* and, therefore, | z K — z„ | will not surpass 
twice the maximum | z* |: 

(25) 


| z, - 2, | <aT 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


335 


where a is a positive function of the coefficients a lK which, in turn, are deter¬ 
mined by the \f / lK . Introducing (25) in (23) we find 

| F, | < 1 + e ' v ' aTl V~» g e \v*\«* T */n 

2 n 

and, finally, from (21): 

( 26 ) I On | < e M2a * ra = e |ul " 2r2 . 

Thus it is seen that for 

(27) | u | < or 1 - 2a 2 | u | ^ rf > 0 
the integral (15) admits the upper bound 

(28) I QM I < (2*y m If ■■■ f e ' ,TV * dh,dt,, dlk = v \ 

It also follows that the contribution to Q n (u) from the region T > T 0 tends to 
zero w ith increasing T 0 , uniformly with respect to n and with respect to u in 
the region \u \ < l/2a. 


3. Asymptotic value of Q n (u). If the quantity F„ introduced in (22) is con¬ 
sidered as a function of Zi/Vn, z 2 /\/ n, • • • , z*/Vw, we may write 

fc+i 

(29) /'\(z,, z 2 , • • • , **) = E P« 


Here, z„ is defined by (20) and, on the right-hand side, z*+i is zero. These func¬ 
tions F v (z\ , z 2 , • • • , Zk) for v = 1, 2, 3 , • • • have all the properties required 
in Lemma C of Part I: At the point z\ = z 2 = • • • = z* = 0 one has F v = 1, 
the first derivatives are 

dFv yi n 

_ = V p vi - vp Vi P** = 0 

dz, i 


and the second derivatives, (i^ k), 


(30) 


dz 2 

W 


dz, dz\ 


? r fr+i "l 

2~ = V 2 p p ,( 1 p vt ) V p v , I Jppi, Pv i 23 Ppk J 

w*p..(-p«) - » s p» £p« - p«E p-xj = 


= v 2 p„( 1 - pj 
- —v 2 p„p n . 


The third derivatives are certainly bounded in any finite region of the z-space, 
and this means also in any finite region of the 2-space. 

The matrix of the second derivatives except for the factor v 2 is exactly that 
defined in eq. (19) of Part I: 

(31) PlK* 5=1 VpipPK 



336 


R. V. MISES 


and the arithmetical means of the derivatives from the matrix in eq. (23) of 
Part I: 


(310 


1 n 1 n 

P.« = - 53p>i$*« -53 Pi 


n >-i 


n 


Applying Lemma C we find 


(32) 


G n =G n 


Z\ 

n * y/ n 1 


Zk \ 

Wn) 


1 exp 


[ 2 Ey.«.«i} 


This is valid in any finite /-region. Since it has been shown at the end of the 
foregoing section that, for small \v\, the outside contribution to the integral 
(15) converges uniformly (for all n) towards zero, we are allowed to introduce 
(32) in (15). Writing 


(33) 


53 Put= 53 7 M *, whereby y tK = 53 Px M a.xa«„ 

l.K l,K \,H 


equation (15) becomes 

(34) <?„(«) ~ (2 r)- m ff ••• f oxp[- * S4 + ••• dt t . 


Now, it is well known that if m lK is any positive definite matrix with the de¬ 
terminant | m LK |, then 


(35) (2t r) k/2 ff ••• f ex P [— 2 53 Wu*. Q dti dt 2 • • • dt k = ^7j==|. 

This is likewise true if the matrix m lK , which we also call M, has the form M = 
Mi — XM 2 where Mi is positive definite, M 2 arbitrary (complex) and | X | suffi¬ 
ciently small. Thus, the integration formula (35) applies to (34) and the result 
is reached, for small \u\: 

(36) Qn(v) ~ Q(u) = with D(X) = | S lK - \y lK |. 


If the which transform the given quadric into a sum of squares are known, 

(36) with (33) supply the solution of our problem. 

The formula (36) is susceptible of several useful transformations. Let us 
write A for the matrix a iK , A' for the transposed matrix, and \k, P, T, I respec¬ 
tively for the matrices \p iK , P u , c u . Then, obviously 

(37) * = A'A y T = A?A', M = I - uiT. 


If we multiply M by A' to the left and by A to the right, we obtain 
(38) A'MA = A'I A - ui A'APA'A = * - w*P*. 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


337 


In this operation the determinant of M is multiplied by | tp t , |. Thus Z)(X) 
can be written as 

(39) D(\) = with y' it = E^xPx^,.. 

\,n 

Here, the knowledge of the is no longer required. 

If the matrix (38) is multiplied twice by 4'*, the inverse of 4% we find >£* — uiP 
and, therefore, 

(40) Z)(X) = |*„| X - XP«|. 

As P is positive definite and real, it follows that all roots of D(X)— 
the “Eigenwerte” of T—are real numbers. Therefore, D~ l,2 {ui) is a regular 
function along the real axis in the w-plane. Thus, (36) which was proved so 
far for small | u | only remains valid for all real values of u : The c.f. of the 
asymptotic distribution is represented by D~ ll2 (ui) for all real u- values. 
Multiplying (38) only once by 4>* we obtain one of the two forms 

(41) I — ui 4>P or I — ui P4> 
which lead to 

(42) D(X) = | 8 iK XSik | = | 8 lK Xs Kl | , s lK — • 

Although this formula has been derived by means of it can be seen by con¬ 
tinuity considerations that it remains valid whatever the (symmetric) matrix 
}p lK is. The formula makes it clear that the asymptotic distribution of the 
quadric is completely determined by the < ‘Eigenwerte ,, of the matrix 

S = This bears out our second main theorem in Chapter II, as far as 

quartics of the form (8) are concerned. It will be seen in sec. 5 how (42) applies 
to the continuous case. 

We, finally, apply to (36) a transformation that is valid only if P has an inverse 
matrix P*. (As shown in Part I, sec. 4 this is not the case if the k intervals to 
which the subscripts 1,2, • • • , k refer cover the whole range of the variables 
Xi , x 2 , • • • , x n ). Multiplying (41) by P* we find the matrix P* — uv& and 
thus 

(43) D(\) = | P„ | X | K - I- 
This is equivalent to 

(44) Q(u) = | | 1/2 ff • • • / exp [- &P* U. 

* + J dfrcfe • * • d£*.. 

According to the definition of the characteristic function eq. (44) can be inter¬ 
preted as stating that 

(45) 


|P U |* exp [-J2P* fcfc] 



338 


R. V. MISES 


is the asymptotic probability density for the simultaneous occurrence of £ 1 , 
£ 2 , ••*,£*. The expression (45) can be arrived at by applying the Central 
Limit Theorem to the case of k independent chance variables. Since, however, 
F* does not exist in general, eq. (44) would not be a suitable point of departure 
for developing the theory that concerns us here. 

4. Asymptotic value of P n (x) 9 illustrations. The relationship between the 
c.f. and the c.d.f. of a distribution is well known and need not be disciissed here 
in detail. We shall use, in this section, two aspects of this relationship only. 
First, the continuity theorem, first proved by G. P61ya [5], stating that if the 
c.f. Q n (u) tend towards a limiting function Q(u), the corresponding c.d.f. P n (x) 
tend towards the P(x) that corresponds to Q{u). Second, the additivity, i.e. 
if Q(u) is of the form aQ'(u) + &Q"{u) with a + (3 — 1, then P(x) is 
aP'(x) + pP"(x ) with the P'(x), P"(x) corresponding to Q'(u) and Q"{u) 
respectively. The following three groups of examples will illustrate the applica¬ 
tion of the foregoing results. 

a) Let us first consider a function of two excess values £1 , £2 only 


(46) x = -f = (Agl + 2 P£i £•» + C(S) 


where the matrix V is given by = A, 4q 2 = ^21 = B } \k 22 = C. The product 
matrix F4> is 


AP a + Bp 12 
AP 2 \ + BP 22 


and the determinant of I — XF'k 


BPn + OP 12 

BP 21 -{- CP22 


(48) Z)(\) - 1 - \[APn + 2 BP l2 + CP*} + \\AC - B 2 )(FnF 2 2 - F 12 ). 

If Xi, X 2 are the two real roots of D(X) = 0, the asymptotic probability density 
of x will be 


(49) 


dP(x) 

dx 


e~ ulx du 




We are particularly interested in the case that F is a complete, ,, i.e. a matrix 
with all horizontal and vertical sums vanishing. Then Fn = F J2 = F w = p^p 2 , 
the last term in (48) cancels out and the only Eigenwert is Xi = 
l/(A —2B + 0)pij>2 • Here, instead of (49) we have 

-Xi* 


(60) 


dP{x) = 
dx 


= _i f e ~ uix du = A. e 

2ir J /. _ m y jr -\/x 

V 1 x. 


Ul 

Xi 


This is, with respect to y/\x\ a Gauss distribution with the variance 
| A — 2 B + C | pip 2 /2* 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


339 


If, in addition to the assumption that P is “complete” (i.e. in the present case 
that p„i + p .,2 = 1 for all v) the further assumption is made that the two inter¬ 
vals 1 1 and 1 2 cover the whole range of the original chance variables Xi , X2 , 
Xi , • • • , one would have also £i + £ 2 = 0 and from (46) 

* = | W - 2J3 + C){f. 

In this case, y/\ x | is a linear statistical function and the Central Limit Theorem 
leads to the same result as that expressed in (50). It is seen, however, from our 
derivation, that (50) holds under wider conditions: If p ,i + p„ 2 = 1 for all v , 
there may exist another interval h within the range of the chance variables 
Xi , X2 , xs , • • • so that £i + £ 2 is not necessarily zero. 

The latter remark suggests the following general theorem: If / is a function 
of the k variables £i, £2 , • • • , £* and g another such function but vanishing when 
£1 + £2 + * * * + (k = 0, then f and / + g have the same asymptotic distribution 
provided that for each v the sum p„i + p*% + • • • + p> k = 1. In the case of 
quadrics this result is equivalent to the following matrix theorem: If P, 4*, A 
are symmetric matrices, P with all horizontal and vertical sums equal to zero, 
^ arbitrary, and A of the form a lK = a t + a* then the two products 

(51) P* and P(tf + A) 

have the same characteristic roots.—This can be proved by the usual methods 
of matrix calculus. The matrix PA has all characteristic roots equal to zero. 2 

b) In the definition of Karl Pearson's test function which is usually called 
X 2 , it is presumed that a sample is drawn from the combination of n equal dis¬ 
tributions. In this case all P M are equal and coincide with P which then can 
simply be written P: 

(52) Pt* = pJlk - Wk . 

The chance variable we now consider will be 

Thus = 6 ltc /Pi and the elements of P& are 

(53') (P*)„ = Z = S« - P‘- 

M 

The matrix I — XPV has the elements 

5i«(l — X) + XPi • 

If the fcth column is subtracted from any one of the others, only two terms re¬ 
main, one equal to 1 - X and one equal -(1 - X) in the last row. Thus, the 

•A proof of the matrix theorem has meanwhile been published by Alfred Brauer, Bull. 
Amer. Math. Soc., Vol. 53 (1947), pp. 605-607. 



340 


R. V. MISE8 


determinant Z)(X) includes ( k — 1) times the factor (1 — X). On the other hand, 
Z)(X) is of degree ( k — 1) and has the absolute term 1. Therefore 

(54) iD(X) = (1 - X)*’ 1 . 

This supplies the x 2 -distribution with (k — 1) “degrees of freedom” 

_ r]P(r) 1 

(55) Q ( m) = (1 - Ut) friO). 

Again, our result is slightly more general than that reached in the usual theory. 
It includes the case that in addition to the k intervals with the probabilities 
Pi , p 2 , * * * , Vk (whose sum is 1) there are other intervals with probability zero. 
On the other hand, if to x 2 a term of the form n2(a t + a«)£i£* is added, this 
would not change the asymptotic distribution. 

One may ask for other quadratic functions of & , & , • • • , £* whose asymptotic 
distribution is given by (55). In particular, one might be interested in a generali¬ 
zation of x 2 for the case of unequal original distributions. The answer can easily 
be given by introducing the cofactors of order (k — 1) and of order (k — 2) of the 
determinant | P tk | . It was mentioned in sec. 4 of Part I that all cofactors of 
order ( k — 1)—in the case of “complete” P—have the same value. It may be 
denoted by A. The cofactor corresponding to the lines i, k and the columns 
X, m will be denoted by IT*;*,, with II = 0 if t = k or X = /x. Then, if / is any one 
of the integrers 1 , 2, • • • , k 

(56) == ^ U lM J t, k ^ l 


is one possible solution. In fact, the product P^ has in this case the elements 
(P40 t< = d lK , for i, k l 


(57) 


= — 1 , “ L = l, K 4 = l 

= 0, “ K = l 


The determinant of I — XP^ is then seen to equal (1 — X)* -1 . 

The solution (56), however, is unsymmetrical in the sense that it does not 
include any terms with £ t . A completely symmetrical solution in which all 
£ play the same role is given by 


(58) 


1 h 

= kA § n, ' : " 


According to (57) the matrix P'P now consists of terms (fc — l)/fc in the prin¬ 
cipal diagonal and — 1/Jc at all other places, that is 

1 

k' 


(580 


(P*),, = « t . 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


341 


In the same way as in the case of (53') it can be seen that the determinant of 
I — X?* equals here (1 — X)* -1 . The asymptotic distribution of with 

the coefficients (58) is , therefore , the x -distribution with ( h — 1) degrees of freedom. 

If the formula (58) is applied to the case of equal P (v) the corresponding 
quadric becomes 




that is, x + a term vanishing with f i + £2 + • • • + £*,. One can easily modify 
(58) so that it leads to x without any addition. 

c) A third group of examples where the asymptotic density is expressed by 
simple functions is that where D(\) is an exact square, that is, all characteristic 
roots (except the one that is zero) have even multiplicities. Let us assume k = 
2m + 1 and let Xi , X 2 , • • • , X m be m double roots. Then 


(59) 



with 

<ra) 


and therefore 


( 60 ) d -^ = IH, 

ax n-i 

Assume, for instance, that all original distributions are uniform, that is 


p (*) 

* u 


= p« 



1 

k 2 


and that the quadric/is given in the form (11) with the following : 
a u = VEl for i = 1 

= y/kc,, ” l > 1, k = 1, 2, • • • , 1 — 1 

(51) y _ 

= - (1 - 1 ) VkCi ” » > 1 . * = ‘ 

= 0 . ” ( > 1 , * = t + 1 , t + 2, • • •, k. 


Then, the 7 ** as defined in (33) become 

7i« = c 4 t(t — !)$*« 



for t or k > 1 


u 


= K = 1 


and D(X) according to (36) takes the form 

k 

(63) D(X) = | i„ - A*. | = II [1 - Xc.*(* - 1)J. 



342 


B. V. MISES 


In other terms, for the quadric 

/ = A*Ci({i + • • • + ifc) 2 + kctiHi — &) 2 + kcsiti + £2 — 2fa) 2 + • • • 

+ kck[ii + £2 + • • * 4* ffc-i — (k — 1)6*]* 

the characteristic X-valUes are l/c 4 t(t — 1). 

Now, to obtain the case of m double roots with k = 2m + 1 we have simply 
to choose 


Ca = 3ca, 3ci = 5cj, 5ce = 7cj , • • • • 

The first term on the right-hand side can be entirely omitted in accordance to 
what was said in connection with (51). Besides, for the same reason, the ex¬ 
pression can be simplified in various ways by assuming {1 + {*+•••+ £* = 0. 
As a numerical example,’take k = 5, C 2 = 3, c* = 1, c = 5, c 6 = 3. Then 

/ = 20(J? + g + {I + 20 £ + 20 g - - Hz - Hi + 10 Hz) 

leads to the characteristic values X = 1/6 and 1/G0 and the asymptotic density 
becomes 


^ = L ( c -*'« _ «>-*'•). 

dx 54 

In a similar way other groups of quadrics with asymptotic distributions of 
the type (60) can easily be constructed. One may, for instance, use eq. (41) 
and make vanish, in the matrix S = all elements on one side of the diagonal 
so that the roots are immediately known. 


6. Transition to the continuous case. In this concluding section, the transi¬ 
tion to the case of a quadric of the form (1) with continuous \p (x, y ) will be 
outlined. The formula best fit for this purpose is eq. (36). We therefore 
suppose the statistical function / given as 

(64) / = // tix, y) dT n (x) dT„(y ) with \p(x, y) = J a(r, x)a(r, y) dr. 

In analogy to (33) we derive 

7(®, y) = // <*(*> s)a{y, t ) dUn(s, t) 

(65) 

= J a(x, s)a(y, s) dV n (s) — J J a(x, s)a(y, t) dW n (s, t). 

Since dW is symmetric, this function y{x, y) is symmetric with respect to x and 
y. If D(X) denotes the Fredholm determinant of the “kernel” y(x, y), we con- 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


343 


elude from (36) that the characteristic function of the asymptotic distribution 
of / will be given by 


( 66 ) 


QnM 


__1 _ 

D(ui) 


if certain convergence conditions are satisfied. 

In order to establish (66) the main point is to find a sequence of functions 
^h( x ) y)> &(#, y)> • * * each of the type considered in the foregoing Sections and 
such that 1) the distribution of the quadric f k with the coefficients tends to¬ 
wards the distribution of / with increasing k and independently of n; and 2) that 
the determinants D k corresponding to \[/ k converge towards D as k increases in¬ 
definitely. Using our Lemma A we can replace the first condition by asking 
that the expectation of | / — /* | should go to zero with k —> <» independently of n. 

The following assumptions shall be made concerning / and the V,(x): The 
function a(r, x) in (64) is continuous and bounded in every finite region; there 
exist two positive continuous functions a(r), p(x) such that 


(67) 


| a(r, x) | ^ a(r)/3(x) 


and that the integrals 


(68) j a(r) dr = M, f fi(x) dV,(x), J 0*(*) dV.(x) 


exist, the latter two being bounded and converging uniformly with respect to 
v. We are going to devise a step function \[/ k (x , y) so that for the corresponding 
J k and any positive ci 

(69) E{ \f-fk\ } ^ «i. 

Let N be an upper bound of the integrals 

(70) / m dV,(x) g N, J Kx) d? n (x) g N 
and € = €i/(5 + 8 N). Choose a value L such that 

(71) f p(x)dV,(x) f /?(x) dV,(x) £ ± 

J\X\>1 M J\x\>L M 

and, calling B the maximum of @(x) in | x \ S L, another quantity R such that 



We subdivide, in the s-^-r-space, the domain | x \ ^ L, | y | g L, | r | g R in 
Jfc* equal cells where k is determined by the condition that the absolute value 
of the variation of a(r, x)a(r, y) within each cell does not exceed «/4 R. Outside 
this domain we set ^*(r, x) = 0 while inside the domain a*(r, x)ct k (r, y) shall 



344 


R. V. MISES 


equal the value that a(r, x)a(r t y) assumes in the center of the respective cell. 
Then y) will be defined by 

(73) Mx, y) = J ctk(r, x)a k (r, y) dr. 

From the definition of k and from (67) and (72) it follows that 

I Mx, y) - Mx, y)\ ^ / | a(r, x)a(r, y) - a k {r, x)a k (r, y)\ dr 

■VlS* 

+ / | a(r, x)a(r, y)| dr 

J |r|>* 

Z2R-^- + 0(x)m [ a 2 (r) dr£ e - + B* 

4/t, J | r | > R 4 


(74; 


[>*_«_ . 
2B 2 


as long as | x | g L, | y | ^ L. If this square is called (L) and the comple¬ 
mentary region (L) we have 


(75) 


/ - /* = f [ [\fr(x, y) - Mx, y )] dT n (x) dT„(y) 
J J(L) 


+ [f *(X, y) dT n (x) dT n (y) 
j Ja) 


and since the integral of | dT„(x ) dT„(y) | is not larger than 4, while, according 
to (64) and (67) 

(76) | Mx, y) | ^ P(x)P(y) J o?(r ) dr = Mf3(x)0(y) 
we conclude from (74) and (75) 

(77) \f-f k \&4$+M [[ p(x)0(y) | dT n (x) dT n {y) \. 

J J(i) 

This gives 

(78) £{|/ — /*|} 4« + M [[ 0(x)0(y)E{\dT n (x) dT n (y ) |}. 

Now, from | dT n | = | dS n — dV„ | g dT n + 2 dV n and from the formulas 
derived in Part II, 

E{ dT„(x )} = 0, E{ dT n (x) dT n {y )} = - dU n (x, y) 

n 

it follows 


1 i rr 


E{ | dTM dTM | J £ i dU n (x, y) + 4 d?,(x) d? n (y) 


(79) 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


345 


with 

(79') dU n (x , y) = 8(x, y) dV n (x ) — dW„(x, y) ^ 8(x , y) dV n (x). 

If this is introduced in (78) and (71) taken into account, we find 

E\ 1/ - AI ) £ 4« + M - f 0 2 (x) dVM 

71 *'|*| > L 

(80) +4 M f [ dV «(*) 

J J(L) 

g 4« + i e + 4 X 2iV« £ (5 + 8AT)« = 
n 

as required in (69). 

On the other hand, it can be seen that the kernel y(x, y) as defined in (65) 
is the limit of the sequence y *(x, y) 

yk{x y y)= ctk(x, s)a k (y , i) dU n (s , t) for x , y in (ft) 

(81) J 

= 0 for x, y in (5) 

where (ft) means the region [ x | ^ ft, | y | ^ ft and (5) the complementary 
region. In fact, from the definition of k and eqs. (67) and (71) one has for x, y 
in (R): 

I y(x, y) - 7*(*, y) I ^ 4^ / I d U n (s, t) | 

+ f [ I <*(x, «)<*(«/, t) dUn(s, t) I 
J J(L) 

(82) ^ ^ + a(x)oi(y) [" f /3 2 (s) dV n (s) 

+ - E f [ 0(s)P(O dV,(s) dV,(t) 

71 v—1 J •'(I) 

S ^ + «(*)«(») ^ (1 + 2W). 

Since a(x) is bounded, the right-hand side goes to zero with t. Finally, for 
x, y in (It) we have 

I y(x, y) — yk(x, y) | g II | a(x, $)a(y , t) dU n (s 9 1) | 

g a(x)a(y) [ j (f(s) d? n (s) 

+ - £ f [ dv,(s) dV,«).l 

71 Vm.1 J J J 


(83) 



346 


R. y. MISES 


Here, the two terms in the brackets are bounded, but a(x)a(y) goes to zero as R 
increases. The conclusion is that yk(x, y) tends uniformly towards y(x , y) 
with A; —> oo. 

Thus, eq. (66) is established provided that the function y(x , y) defined in (65) 
has a Fredholm determinant 2)(X) that is the li mit of th e corresponding alge¬ 
braic determinants and provided that the c.f. y /1 /D(ui) leads to a c.d.f. with 
bounded derivative. 

As an example let us consider the case 

a(r, x) = y/g'(r) for r ^ x 

(84) 

= 0 “ r <x. 


This function is not continuous as it was assumed in establishing (66). How¬ 
ever, the existence of a single discontinuity line, x = r, does not invalidate the 
argument. We assume g'(r) = 0 and equal to dg/dr. Then, in the case of 

(84) : 

(85) M x > V) = / <*(r, x)a{r, y)dr = - g{y) for x g y 

= - g(x) “ x ^ y. 

Since, however, adding to ^ a function of x or of y alone does not change the 
value of /, we can also use 


(85') 


<Kz, V ) = g(x) for x g y 

= g(y) “ x ^ y. 


The statistical function / that corresponds to (84) can be computed either from 

(85) or (85')—or directly from (84) if we use the formula that follows from (64) 

(86) / = /[/ a(r, x) dT n (x) J dr. 

The integral in the brackets is, in our case, seen to equal -\/g'(r ) T„{]■), thus 


(86') 


/ = / g'(r)[S n (r) - V„(r)fdr. 


This is exactly the test function to 2 mentioned in the Introduction, eq. (3). 

To find the distribution of / we have to compute y(x, y). Its definition (65) 
can be written in the form 


(87) y{x, y) = i Z) £ J a(x, s)a(y ,«) dV,(.s) - ja(x, s) dV,(s) Ja(y, s) dF,(s)J . 
This supplies in the case of (84) 


( 88 ) 


y(x, y) = ■\/g'(x)g'(y)[V'n(x) — F„(x)F„(y)] for x ^ y 
= Vg , (x)g'(y)[Vn(y) - F n (x)F„(y)] “ x y. 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


347 


Here, the second term in the brackets is the arithmetical mean of the products 

V,(x)V,(y) : 

If the distributions V v (x) are all equal (independent of v) we have simply to 
write V(x) instead of ? n (x) and V(x)V(y) instead of V n (x)V n (y). If, in addi¬ 
tion, the distribution in the original collectives are uniform in the basic interval 
0 to 1, one has 

(89) ^ = ^v'( x )9\y) x U — y) for 0 ^ x ^ y ^ 1 

= y/g\x)g\y) y(\ - x) “ 0 g y ^ x ^ l. 

This is the case dealt with in Smirnoff’s papers [7, 8]. If, finally, g\x) is sup¬ 
posed to be equal to 1 in the interval 0, 1, we arrive at a kernel y(x, y) whose 
Fredholm determinant is well known: 


(90) '» -S> D(x) ,«LVX 

= i/(l — x) “ x ^ y. VX 

This supplies immediately the c.f. and (in form of a definite integral) the c.d.f. 
of the asymptotic distribution of w 2 for g' = 1. 

The same result can be reached without the use of a(r, x) if we apply one of 
the transformations discussed in the foregoing Section. Take, for instance, 
instead of y{x, y) the unsymmetric kernel a(x, y) corresponding to the matrix 
S = defined in (41). If all original distributions are equal, the element of 
S can be written as 

(91) s lK = X) Piu'I'iik = pXtiK — H 

M M 

Calling i(x) the density dV(x)/dx in the continuous case, the corresponding 
kernel becomes 

(92) <t(x, y) = v(x) ^(x, y) - j <P(s, y)v(s) dsj. 

With the ^-values from (85'), g' = 1, v = 1, this gives 


(920 


a(x, y) = x - 


+ |for 


y_ 

2 


x ^ y 
x ^ y. 


It can easily be seen that the “Eigenfunctions” of this a{x, y ) are sin(Vx m x) 
with X„ = mV, and, therefore, the Fredholm determinant is that indicated in 
( 9 °). 

It might be added that the expectation and the asymptotic variance of « 
can be computed, independently of the distribution, from the formulas de¬ 
veloped in Part I. The results are 


«£{a> 2 } = J g'(x)V n (x)[ 1 - V„(x)} dx 


( 93 ) 



348 


R« V. MISES 


and, in the case of all F,(x) equal 

(94) n s Var{«*( ~4 jj g'(x)g'(y)V t (x)[l - V(y)]*dxdy. 

x$V 

These formulas have already been given in [4]. 

Another, more general, remark is this. If all 7„(x) are equal, one can reduce 
the problem, by a transformation of the original chance variable x into x' = 
V(x), to the case of a uniform distribution over the interval 0 to 1. If the V v (x) 
are not equal, it might still be possible to find a transformation x' = x'(x) such 
that all original distributions extend over a finite region on the x'-axis only. 
In this case the restrictions concerning the behavior of the distributions at 
infinity drop out. 


REFERENCES 

[1] Harald Cramer, “On the composition of elementary errors,” Skand. Aktuarietids- 

drift , Vol. 11 (1928), pp. 13-74, 141-180. 

[2] R. v. Mises, “Les lois de probability pour les fonctions statistiques,” Ann. de l*Inst. 

Henri Poincare , Vol. 6 (1936), pp. 185-212. 

131 -, “Sur les fonctions statistiques,” Soc. math, de France , Conference de la 

Reunion internat. des Mathematiciens, Paris, 1937. 

[ 4 ] - f Wahr8cheinlichkeit8rech?iung und ihre Anwendung, Leipzig and Wien, 1931. 

[5] G. P6lya, “t)ber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung,” 

Math. Zeitschr ., Vol. 8 (1920), pp. 171-181. 

[6] T. A. Shohat and T. D. Tamarkin, The Problem of Moments , Math. Surveys No. 1, 

New York, 1943. 

[7] N. V. Smirnoff, “On the distribution of the w 2 -criterion of Mises,” (In Russian), 

Recueil Math., nouvelle serie, Vol. 2 (1937), pp. 973-993. 

[8] -, “Sur la distribution de w 2 (criterium de M. von Mises),” Comptes Rendus 

Paris , Vol. 202 (1936), p. 449. ■ 

[9] Vito Volterra. Leqons sur les Fonctions de Ligne, Paris, 1913. 

[10] Vito Volterra and Joseph PriRfes, Theorie generate des Fonctionelles, Paris, 1936. 



APPROXIMATE SOLUTIONS FOR MEANS AND VARIANCES IN A 
CERTAIN CLASS OF BOX PROBLEMS 

By Philip J. McCarthy 
Social Science Research Council 

1. Summary. Consider n boxes, each box having an associated probability, 

p% > CSC Pi == l)> and an associated integer, ki . If balls are thrown one by one 
* 

into these boxes, the probability being p. that any one ball falls into the ith box, 
then the number of balls which must be thrown in order to obtain, for the first 
time , at least kt x balls in the z’xth box, at least ki 2 balls in the ^th box, • • • , and at 
least ki t balls in the t,th box, is a random variable, N,[ki(pi), fe(p 2 ), • • • , &»(p n )]. 
Here i\ , 12 , • • • , it represent the numbers of that set of s boxes, (1 < 8 < n), 
which first satisfies the stated condition. 

The distribution of A^ g [fci(pi), fe(p 2 ), • • • , k n (p n )] can be written down for any 
set of values assigned to n , s, the p/s and the fc.’s. However, for n greater than 
2 the distribution assumes such an extremely complicated multinomial form 
that except for certain special cases even the mean of the distribution cannot 
be numerically evaluated without a prohibitive amount of labor. 

This paper presents the exact moments of Ai[fci(pi), fe(p 2 )] and A 2 [fci(pi), fe(p 2 )] 
in forms that readily lend themselves to computation and .shows how these 
moments can be used to obtain approximate values for the mean and variance 
for certain situations where n is greater than two. These approximation formu¬ 
lae are given for 

1. The mean and variance, for any n and any set of fo’s and p.’s when s — 1 
or n. 

The mean, for any n and 2 < s < n — 1, when p< = 1/n, ki = k t 
(i = 1, 2, • • • , n). 

Some indications are given concerning the error of the approximations, and the 
circumstances which lead to a minimum (and maximum) error. Curves have 
been prepared to show the mean for the two box case, the primary function 
of these curves being to assist in the application of the approximation formulae. 
Some problems where the results of this paper might be applicable are suggested 
in the Introduction. 

2. Introduction. A box problem is defined when one is given a fixed number 
of boxes, a collection of balls (either finite or infinite), a set of rules governing 
the throwing of the balls into the boxes and a statement of the conditions which 
will bring the throwing to an end. The terminating conditions usually state 
either that a fixed number of balls will be thrown or that balls will be thrown 
until a particular distribution of balls in the boxes has been obtained. In the 
first of these, interest is centered on the possible distributions which can be ob- 

349 



350 


PHILIP J. MCCARTHY 


tained, while in the latter the number of balls necessary to obtain a specified 
distribution is of primary interest. 

This paper will be concerned with certain problems falling in the latter cate¬ 
gory. In the simplest case one is given two boxes with associated probabilities 
Pi and P 2 and associated integers k x and fo . Balls are thrown one by one into 
the two boxes, the probability being p x that any one ball goes in the first box and 
Pi that it goes in the second box. This process is stopped when either ki balls 
fall in box 1 or ki balls in box 2, whichever occurs first One is interested in the 
distribution of the number of balls necessary to terminate the throwing. This 
problem was stated in essentially this form by Laplace [4], but he contented 
himself with merely writing down the probability generating function. 

Here the special case of two boxes will be treated in detail and the results 
will then be generalized to the n-box case. In all of these instances it is pos¬ 
sible to write down exact expressions for the mean and variance of the number of 
balls required to achieve the stated distribution. However, in almost every 
case the resulting expressions are too complicated to be of any use when a numer¬ 
ical answer is desired. The principal portion of this paper will be devoted to 
obtaining approximate formulae from which numerical answers can be obtained 
for these problems. Some evaluation of the degree of approximation will be 
given in section 5, while curves to facilitate the computation will be given in 
section 6. 

The statement of these problems in terms of boxes and balls may lead one to 
the belief that th£y have no other interpretation. Actually this is not the case, 
and a few illustrations of this point Mill now be given. For example, consider 
the curtailed single sampling plan used in acceptance sampling. A buyer re¬ 
ceives a lot of articles. This lot will contain a certain proportion of defective 
items. The buyer wishes to determine on the basis of sampling whether to 
accept or reject the lot. His knowledge of his own situation will allow him to 
specify the largest proportion of defectives which he is ordinarily willing to 
accept and the risk he is willing to take of accepting a lot with a proportion de¬ 
fective larger than this critical proportion. On the basis of these two values it 
is possible to set up a sampling plan in which the buyer will take a sample of size 
n out of the lot, inspect it, and reject it if there are k x or more defectives in the 
sample. Of course once he has obtained k x defectives there is no need to inspect 
the remainder of the sample. The lot will then be automatically rejected. 
Similarly, once he has obtained n—k x non-defectives, he can accept the lot with¬ 
out inspecting the remainder of the items. The average number of items which 
he must inspect in order to reach a decision is given by the solution to the two 
box problem stated above. Box 1 will receive the defective items, the asso¬ 
ciated integer being k x and the associated probability being p x , the true propor¬ 
tion of defectives in the lot. Box 2 will receive the non-defective items, the 
associated integer being n—ki and the associated probability being pi , the true 
proportion of non-defectives in the lot. 

Laplace [4] considered problems of this type as applied to games of chance. 



BOX PROBLEMS 


351 


Thus suppose there are two players A and B who participate in successive trials 
of a given event, the probability being pi that A wins on any one trial and pi 
that B wins. Then one can associate the integer fa with A and fa with B by 
saying that A wins the match if he wins fa trials before B wins fa trials and con¬ 
versely. The analysis is exactly the same as for the two box problem. It is 
apparent that this same situation can be extended to any number of players. 

Another possible interpretation is as a particular kind of random walk prob¬ 
lem. Let a particle start at the origin of a system of rectangular coordinates 
and suffer successive positive unit displacements, the probability being pi that 
it moves one unit in the ^-direction and pi that it moves one unit in the in¬ 
direction. Furthermore assume that it is absorbed if it ever reaches the line 
x = fa or the line y = fa . Then the analysis of the above two box problem 
gives the mean number of displacements before it is absorbed. In the same 
manner, such a random walk problem can be stated for n dimensions. For n 
equal to three, there will be three planes and the particle will be absorbed when 
it reaches any one of the three. 

Certain problems in public opinion polling may fit into this category of box 
problems, particularly if the above problem is rephrased so that one requires 
the mean number of trials to obtain at least fa balls in the first box and at least 
fa balls in the second box, for the first time. For example, suppose that one 
desires to sample from a population composed of two types of individuals, 
A and B. Let the population proportions of A and B be known and be de¬ 
noted by pi and p 2 . Then if one wishes to obtain at least fa individuals of type 
A and at least fa individuals of type B, the average number of persons who must 
be chosen in order to fulfill this condition is given by the analysis of the cor¬ 
responding box problem. This is rather artificial when there are only two cate¬ 
gories and pi + pi — 1. However, these restrictions will be removed in the 
course of the paper, and the problem will be considered for any number of types 
of individuals. 

As a final example, consider one of the many bombing problems which arose 
during the course of war research. Suppose that a factory which is to be de¬ 
molished has n vital units, the destruction of any one of which will destroy the 
usefulness of the factory. Let the probability be pi of hitting the first unit with 
a single bomb, pi the probability of hitting the second with a single bomb, etc., 
and assume that fa bomb hits will finish off the first unit, fa , the second, etc. 
Then the mean number of bombs required will be given by the analysis for the 
corresponding box problem. 

Corresponding interpretations are possible for the other problems which are 
to be considered in this paper. Some of these will be indicated as the analysis 
proceeds and it is to be hoped that others will occur to the reader. 

As previously noted, this paper will be concerned with the distribution of balls 
necessary to terminate the throwing, assuming the p* s are known. Another 
possible interpretation is to assume the p’ s unknown and to estimate them with 
the results of the ball throwing. Certain aspects of this problem for two boxes 



352 


PHILIP J. MCCARTHY 


have been considered by J. B. S. Haldane [3] and Girshick, Mosteller and Savage 

[ 2 ). 

3. Solution for the two box case. 

3.1. Distribution and moments of the number of trials necessary to obtain either 
ki baUs in the first box or fa balls in the second box . This problem may be stated 
as follows: Suppose one is given two boxes with associated probabilities pi and 
P 2 , and associated integers fa and fa . For the present it will be assumed that 
Pi + Pa = 1, although this restriction will be removed later. Now let balls be 
thrown one by one into these two boxes, the probability being pi that a particular 
ball will fall in the first box and p 2 that it will fall in the second box. This 
process is stopped on the first ball which leaves either fa balls in the first box or 
fa balls in the second box. The number of balls, x y which is required to accom¬ 
plish this is a random variable and we desire the moments of x. The probability 
that fa balls are obtained in the first box on the ath throw, fa< x < fa + fa — 1, 
before fa balls are obtained in the second box, is immediately seen to be 

M [C*: 0 - P . - (*: j) . 

Similar reasoning gives the probability that fa balls are obtained in the second 
box for the first time on the zth throw, fa < x < fa + fa — 1, as 

(3.2) 

From (3.1) and (3.2), the hth moment of x , E(x h ), is 

ki+ki-l / ki+ki-l / i\ 

(3.3) z / (* _ j) “ + Z / (* _ |) pr H pl' . 

However, it is inconvenient to consider (3.3) directly. A much simpler pro¬ 
cedure is to determine the increasing factorial moments of x and then transform 
these into the ordinary moments. Thus the hth increasing factorial moment of 
x y F ht i[ki(pi), fa(p*)]y is defined as E[x(x + 1) • • • (x + h — 1)]. Then F h ,i[ ] 
is equal to 


1 | „*-*i 

^Pl P2*. 

(3.4) can be transformed by means of the relationship 

X) (^ I" Pi = (1 — Pi) ( * +1) ii-p ,(k + 1, o + 1), 

y-o \ j / 


il+ f ! (x + h - 1) 
{x - 1)! 


■'(;=!) 


p!'pr“ 


+ T 

2 \ X ~~ 


( x ~ 

\k 2 - 



(3.5) 



BOX PROBLEMS 


353 


where I x (p, q) is the Incomplete Beta-Function as tabulated by Karl Pearson 
[6], and the result is obtained that 


(3.6) 


rr r/ (n > 7 (n W kl(ki + 1) • • • (^1 + A — 1) f * 

Jv2\Pv j = - I Pi (ki + h, fa) 

fa(fa + 1) ... (k 2 + h - 1) 

+ h I f t (fa + fa fa)• 

V 2 


The ordinary hi h moment of x may be written in terms of F u [ ], F 2 ,i[ ],•••, 
F h , i[ ] as 

(3.7) E(x h ) = tful }~ (-l)‘ +i , 

»-i t! 

where A’O* represents a difference of zero. Tabular values of A% h /i\ are given 
by Fisher and Yates [1]. 

In particular, the mean and variance of x , which will receive the special desig¬ 
nations Ei[ki{pi ), fete?)] and o-?[fa(p 2 ), fa(p 2 )] respectively, are 


(3.8) — + 1,4*)+- /„(/f 2 + 1,*0 

Pi Pa 

and 


*!<*L+!> , n(J . +2>w+ *5»! + i) ;ii(i!+2iW 

(3.9) P 1 


— F 2 [fa(p 2 ), fa(p 2 )] — {Ei[ki(pi), fa(p 2 )ll 2 • 


In the event the p’s are equal and sum to one, Eilklipi), fa(pi)] will be abbreviated 
to Ei[ki , fa], and finally, if both the p’s and k 's are equal, it will be written as 
Fi[A; 2 ]. In this two box situation, the only other possibility is F 2 [fci(pi), fafe)], 
which will denote the expected number of balls required to obtain at least ki 
in the first box and at least fa in the second box, for the first time. This problem 
will be considered in section 3.2. 

In order to facilitate the computation of mean values, both for the two box 
problem itself and for its application to problems involving a larger number of 
boxes, (3.8) has been graphed for various values of fa, fa , pi and p 2 . A dis¬ 
cussion of this procedure and the results obtained will be found in section 6. 

There is one further result which will later prove useful. Consider the situa¬ 
tion when there is only one box with pi and fa , pi < 1. This is the same as 
having two boxes where the k 2 corresponding to the second box is infinite. In 
other words, one can terminate the throwing of balls only because of what hap¬ 
pens to the first box, never because of anything that happens to the second box. 
In this case one obtains 

EAklipd, 00 Wl = 53 X ^ Pi‘pr* 1 = ^. 


(3.10) 



364 


PHILIP J. MCCARTHY 


Similarly, 

GUI) • <W1 - • 


3.2. Distribution and moments of the number of throws necessary to obtain at 
least ki balls in the first box and at least fa balls in the second box . This problem 
may be stated as follows: Suppose there are two boxes with associated probabil¬ 
ities pi and p *, and associated integers fa and fa. As in 3.1, pi + P 2 = 1. 
Let balls be thrown into the boxes one by one, the probability being pi that a 
particular ball will fall in the first box and p* that it will fall in the second box. 
This process is stopped on the first ball which leaves at least fa in the first box 
and exactly fa in the second or at least fa in the second and exactly A* in the first. 
Again x is the number of balls required to accomplish this. As explained in 
3.1, the mean value in this case will be written as E 2 [fa(pi), faip*)]- The analysis 
follows through as in 3.1 and the mean number of trials is equal to 


(3.12) 


E ^ 


(rO 


p\ l pT kl + 




_ a . 

Pi Pi • 


Making use of (3.5), this can be written as 


(3.13) — [1 — I Pl (k i + 1, fa)] + -[1 - /„(fa + 1, fa)]. 

Pi Vi 

Referring to (3.8) it is evident that 

(3.14) E\[fa(pi), fa(j) 2 )] + E 2 [fa(pi) y fa(p*)\ = — + —. 

Pl P 2 


The hth. increasing factorial moment in this problem, denoted by Pm[&i(Pi)» 

faipi)], is 


w + D-ch + t-i) ,,. 


(3.15) 


Pi 


, kj(fa + 1) • *• (fa + ft — 1) f - , .. 

+ \ [1 - Ipt\fa + hy kj]. 

Vi 


Comparison of (3.15) with (3.6) gives the relationship 


E *,i[ J + FhA ] 

(3.16) 


fa(ki + 1) • • • (fa + h — 1) 

h 

Pi 

, fa(fa + 1) • • • (k% + h — 1) 

T* ~h 


The ordinary moments of x can be computed from (3.15) by the use of (3.7). 
That is, formula (3.7) holds in this case if P* t i[ ] is replaced by ]. 



BOX PROBLEMS 


355 


It can be easily shown by the use of the recursion relationship for the Incom¬ 
plete Beta-Function, 

Q) = xl»(p - 1, q) + (1 - x)I x (p , q - 1), 

that F*,i[ ] and 7^,2[ ] satisfy the partial difference equation 

^m[&i(Pi), fe(pa)] = /iFA_ ltt -[A;i(pi), hip*)] 

(3.17) + - l)( Pl ), feW] 

+ (fe “ 1)(P|)]» 

where i = 1 or 2. This equation can be used as an alternative way of obtaining 
many results, examples of which are (3.10) and (3.11). Certain of these appli¬ 
cations have been discussed by McCarthy [5], 

4. Solution for the n box case. 

4.1. Preliminary discussion . The problems of this section, although direct 
generalizations of the two box cases, can perhaps be most easily stated and 
illustrated as applied to the behavior of a random particle. Suppose that 
we have a random particle which starts at the origin of n-dimensional rectangular 
coordinates and moves in unit steps along the positive coordinate axes. At 

any given point the probability will be taken as that it moves in the z*-direc- 
» 

tion. X) V * i s assumed to be one unless otherwise specified. Now consider the 

i-i 

n hyperplanes, Xi — ki , and assume that the particle will be absorbed if it passes 
through a specified number , say s, of these hyperplanes. Notice that we are 
interested only in the number of planes which it passes through, and not in the 
particular ones. For each a, (a = 1, 2, • • • , n), the number of moves which the 
particle makes before it is absorbed is a random variable, and in this section we 
will be concerned with the distribution of this random variable. The cor¬ 
responding interpretations for boxes and balls is immediately obvious. 

These problems are seen to be generalizations of the two box cases considered 
in section 3. Although it is always relatively easy to write down formal ex¬ 
pressions for the quantities to be considered, the step from two boxes to three 
or more boxes produces expressions which are extremely difficult, or even im¬ 
possible, to evaluate. In this section we shall develop approximate solutions 
which make use only of simple computations based on the solution for the two 
box case. 

As an introduction to the contents of this section, we shall discuss briefly a 
box problem which is a special case of the general problem. Assume that there 
are n boxes with a probability of 1/n that any one ball will be thrown into a 
particular one of the n boxes. Then one can ask for the mean and variance of 



356 


PHILIP J. MCCARTHY 


the number of trials required to obtain s occupied boxes (i.e. k\ « k% ■■ • * 
K = 1). Making use of (3.10) and (3.11), we obtain 

E 1 [l n ] - 1 

«n -1 + (2-zii), - (?)] - ■ + 5-5 -j 


(4.1) 


= 1 + 


n , n 
n — 1 n — 2 


#.[l n ] = 1 + 


71—1 


+ 


71 — 5+1 


*-l ! 

= nZ— 

t—o n — i 


and 


*\l n - 0 

+ (!)]-» 


+ 


(» - l ) 2 


Orltl”] - 0 + 


n 


(n - 1)* 


+ O’! 




oo I ? 
n 


(4.2) 


= 0 + 


+ 


2n 


(»- l) 4 (n — 2)* 


*;[!"] - 0 + 


n 


(» ~ l) 2 


+ 


2n 


(» - 2) 1 

+ ••• + 


(« — 1 )n 
{n- 8+ iy~ ’’fi (n - t)* • 


f-1 

nZ: 


The solution for this problem for 5 = n is given in Uspensky [9], but a straight¬ 
forward solution requires a great deal of formal manipulation. The step-by- 
step procedure used here is somewhat indicative of the methods to be used in the 
succeeding portions of this paper. 

4.2. Mean and variance of the number of trials required to obtain either ki balls 
in the first box , or k% in the second , • • • , or k n -i in the (n — 1)$£, the probability 
associated with the nth box being non-zero. The mean number of trials in this 



BOX PROBLEMS 


357 


particular problem is represented by m(pi), ••• , fc~-i(p»-i), «o (p.)]. The 
formal expression for this quantity is 

y? ._ 0 ~~~ 1)1 

, , (A:,- - 1)! O’ ~ kd\ V< 


V V 0 ~ f, r 

^ fi !• • • r<_j 1 r«+! !• • • r„ ! Pl " P< 

where the third sum is taken over all values of the r’s such that 




+ r»-i + r»+i + • • • + r w = j 


T\ < fa , • • • , r»~i < fc*_i, r<+i < fc*+i» • • • , r*-i < 1. 

This expression can be reduced by one dimension by the application of some of 
the results for two boxes. Consider for the moment only those balls going into 
the first (n — 1) boxes. Then the number of balls (conditional) which is neces¬ 
sary to obtain either fa in the first box, or fa in the second, • • • , or k n -i in the 
(n — l)st box is a random variable X which takes on values 

fa , fa + l, • • • , fa + fa + * • • + k n -i — (n — 2) 

with corresponding probabilities 7r,-, where with no loss of generality it is as¬ 
sumed that fa < fa < • • • < k n ~i . 7r,■ is given by a sum of (n — 1) multinomial 

expressions, the probability associated with the ith box now being j p^j , 

which will be designated by p \. 

Under these circumstances it is apparent that 

(4.4) EAkiipi), • • • ,« (pJ] = 13 VjEilx/ipi -+ p.—i), oo (p„)J. 

i 

However, (3.10) can be applied to each term in (4.4), leading to 

(4 ’ 5) (P. +P»+ •” +P n-l) ? • 

Now from the definition of 7 r, and a;,- we have 
^l[fe(Pl), ••• , kn-lipn-l), 00 (P»)l 

= TT \ ^i[fa(pi), kf(pi), ••• ,fe»-i(pi-i]). 

V^l + P2 + • • • “T Pn-i; 

Similarly, the application of (3.11) gives the result that 

<A[fa(pi), • • • , &n-l(p7i-l), 00 (pn)j 

“ (Pi + P. + P •• +p.-i) ! £ltfcl(p ' ) ’' ’' 



358 


PHILIP J. MCCARTHY 


These results are of immediate importance for two reasons: 

1 . They indicate that by combining boxes and introducing a new random 
variable, certain problems can be simplified. This statement will be expanded 
and the principle applied repeatedly in the later portions of this paper. 

2. With respect to the section on two boxes, they mean that the restriction 
Pi + ?>2 = 1 is not necessary for the solution of the problems. One can always 
assume that p«( = 1 — p\ — P 2 ) refers to a box which receives balls but which 
otherwise has no effect on the outcome of an experiment. In this paper it has 
been convenient to refer to such a box as having an infinite capacity. 

4.3. The mean value and variance of the number of trials required in a two box 
problem when one or both of the constants fa and fa are replaced by random variables . 
The discussion in 4.2 has indicated that the idea of associating a random variable 
with a box instead of a single integer may sometimes lead to simplification. 
Here this procedure will be treated in more detail. Consider £i[fa(pi), fa(p 2 X ] 
and assume that fa is replaced by a random variable X which can take on values 
Xi , x 2 , • • • , x t with corresponding probabilities tc\ , • • • , m , • • • , . Under 

these circumstances E\[ ] itself becomes the random variable Ei[X(pi ), fa(p 2 )], 
taking on values Ei[xi(pi), fa(p 2 )], (i — 1 , 2 , • • • , t) } with corresponding prob¬ 
abilities n . The mean value of this new random variable can be formally 
written down as 

(4.8) EmX{ Vl ), h{p 2 )\) = t MxAvdMv^l 

*-l 

This expression can always be calculated from the probabilities id and (3.8) 
or from the curves given in section 6 . However, in the applications which will 
arise later in this paper, this computation would be very time consuming. In¬ 
stead, an approximation to (4.8) will now be derived which will prove to yield 
very good results, and which can be obtained by a simple reading on the above 
mentioned curves. 

If X is regarded as a continuous variable, then Ei[X(pi ), fa(j? 2 )] is a con¬ 
tinuous function of X, and, in fact, can be represented by a single curve similar 
to those appearing in section 6 . Moreover, as is apparent from (3.8), repeated 
differentiation of Ei[X(pi), fa( 3 ^)] yields continuous derivatives. Consequently, 

t 

E\[X(pi), fafe)] can be expanded in Taylor series about a, where a = • 

»—i 

This procedure gives 

(4.9) Emxfa), ktfa))) = i; 7T.- L (zi ~ ay Ei[a( Pl ), hfa)], 

i-1 ;-0 J l 

where E{[a(pi), fa( 2 * 2 )] represents the jth derivative of Ei[X(pi), hip*)] with 
respect to X evaluated at a. Interchanging the order of summation one ob¬ 
tains 

1-0 J I i-1 


(4.10) 



BOX PROBLEMS 


359 


The final result then becomes 

(4.11) E(E t [x(n), hip,)]) = £ M „ 

y-o ,7 I 

where m/ Is the jth moment of X about its mean, a. Thus to a first approxima¬ 
tion 

(4.12) EVBjXtod, ~ Up,)]. 

It is of interest to note that if Ei[X(pi), fe(p 2 )] is linear in X then (4.12) is an 
exact expression since all derivatives except the first are zero. Furthermore, 
if Ei[X(pi ), fe(p 2 )] is of the second degree in X , then only the second non-zero 
term on the right hand side of (4.11) needs to be added to (4.12) in order to 
make it exact. The former of these is the relation which gave an exact solution 
in 4.2. 

It is important to realize that this analysis for E{Ei[X{p^), ktijh)]) can be 
immediately applied to E(E 2 [X(pi) 1 fe^)]). For, by the use of (3.14) and 
(4.8), one obtains 

(4.13) Emxipd, hip,))) = ~ + 7 - EiEAXipd,hip,))). 

V 1 V 2 

The same analysis can be applied to F h , i[ ] and the general result obtained 
that 

(4.14) E(F htl [X( Vl ), fa(pO]) ~ WaW, feW]. 

This immediately allows one to approximate the variance in the obvious manner. 

It is of interest to consider briefly the situation when both h and are re¬ 
placed by random variables. Let k x be replaced by Xi taking on values Xu , 
Xn , ■ • • , Xu with probabilities irn , t 12 , • • • , iru and fe be replaced by X 2 
taking on values X 2 i $s« with probabilities 7r 2 i, 7r 22 , ■ • • , . Then 

(4.15) EiEilX^i), Xtip,)]) = J2 ruTijEAxuipi), x»i(pa)], 

where i = 1, 2, • • • , t and / = 1, 2, • • • , s. Again applying Taylor series 
and expanding about a = £ *n*n and b = £ irjjXay, the result is obtained 

ft i 

that 

(4.16) EiE i [X l ip l ), X,(p,)]) = t o Mlu f 

where Ei’laipi), 6(ps)] is the uth partial derivative with respect to X, and 
the nth partial derivative with respect to X, of £i[Xi(pi), Xj(p>)] evaluated 
at Xi = a, X, = b. This gives the approximate formula 

EiEdXiipi), X,ip,)]) Bfflpd, bip,)]. 


(4.17) 



360 


PHILIP J. MCCARTHY 


4.4. Mean and variance of the number of trials required to obtain either (at 
least) fa balls in the first box , or (at least) fa balls in the second box , • • • , or (and 
at least) k n balls in the nth box. In accordance with previous notation the mean 
number of trials required is given by Ei[fa(pi) y fa{&) y • • • , ft n (Pn)]. The exact 
value of this quantity can be written down and it would be a complicated multi¬ 
nomial expression. The evaluation of such an expression would be extremely 
difficult, if not impossible, especially for large values of fa , fa , • • • , k n . In 
order to obtain an approximation to E\[ ], repeated applications of (4.12) can 
be made and the resulting expression can be evaluated by means of the curves 
in section 6. 

For convenience, consider Ei[fa(pf) y fe(p 2 ), fa(p 3 ) y fa(p A )]. The general 
result will then be apparent. Assume that the first three boxes form a single 
unit with probability (pi + p 2 + P 3 ). Then the number of balls required to 
obtain either fa in the first, fa in the second or fa in the third, if all balls are going 
in these three boxes, is a random variable X. Consequently, 

(4.18) Ei[fa(pi), • • • , & 4 (p 4 )] = E(Ei[X(p 1 + p 2 + p s ), & 4 (p 4 )]). 

Applying (4.12), 


EAfaipi), • , h(p A )] 


Ei Ei j\i ^ + V2 + p 3 ) ’ k2 ( Pl + p 2 + p 3 ) ’ ks ( Pl + p 2 + p 8 J 


(pi + Pi + Pi), ta(p 4 ) 


Applying (4.12) once again the final approximation is 


EAfa(pi) } ••• , A’ 4 (p 4 )] 




pi + pi \ 

Pl + P2 + Pi) 


\Pl + Pl+ Pl) 


(Pl + P2 + 


Pi), k t (p t ) . 


Expression (4.20) can be translated into a course of procedure. One considers 
the first two boxes and computes 

It is then assumed that Oi is a new number associated with a box with probability 
(pi + pi) and 




pi > 

+ P2 + Pl) 



BOX PROBLEMB 


361 


Repeating this procedure again, one computes a* = Ei[a%(pi + p 2 + Pz), Aj 4 (p 4 )], 
and by (4.20) this is approximately equal to Ei[h(pi) , • • • , fc 4 (p 4 )]. This method 
of computation is seen to be completely general and one can apply it to any num¬ 
ber of boxes. Each step consists of computing Ei[ ] for two boxes and con¬ 
sequently can be carried out with the curves of section 6. It is evident that 
the order in which the boxes are taken may have an important effect on the size 
of the error involved in using this step-by-step procedure. This problem will 
be considered in section 5. 

It is of interest to note that one can also obtain another approximation for 
E\[ki(pi), fa(p 2 ), &a(p 3 ), Suppose that the first two boxes are con¬ 

sidered as one unit and the second two boxes as another unit. Then the num¬ 
ber of balls which must fall in the first two boxes in order to obtain either h in 
the first box or hi in the second is a random variable X\ . Similarly a random 
variable X 2 can be associated with the last two boxes. Accordingly 

(4.21) #i[Wpi), • * * , h(pt)] = E(Ei[Xi(pi + pa), X 2 {p z + p 4 )]). 


By use of (4.17), (4.21) can be written as 

, *.«] - Ei [ El [fe ( pT f^), fe {^r pt )] (r, + vD, 


(4.22) 


Si [*■ Gr+i.) ■ *• G, + ?,)] + ”■] • 


This same analysis applies directly to the factorial moments. In particular 


Ft.i[ki{pd, • • •, WpJ] c* f 2 ,i 


(4.23) 


[ & [ R t+p.) ■ fc Gr+s)] ■ 

Gt+pT+7.)] (p ' + + P ‘ K *‘ WI - 


From (4.20) and (4.23) an approximate value for al[ki(pi), fe(pa), h{p*) } A; 4 (p 4 )] 
can be obtained. This procedure is also perfectly general and so an estimate 
of cr?[ ] can be obtained for any number of boxes. 

This same method can be immediately applied to the approximation of 
E n [ki(pi) } • • • , k n (p n )]. One simply considers the boxes two at a time, comput¬ 
ing E 2 [ ] at each stage instead of E\[ ]. 

4.5. Solution for E a [k n ] and E,[ki~\ hi]. When s is different from 1 or n, 
the complexities of the problem force one into the consideration of only the 
quantities given in the title of this subsection. The corresponding problem 
for three boxes, namely E 2 [ki(pi), hip*), h(pz)], has been treated for general 
ki and p t by McCarthy [5]. However, the resulting expression is so complicated 
that it will not be given here. 

The process to be used consists of reducing the subscript s by a series of steps 



362 


PHILIP J. MCCARTHY 


until the subscript 2 is reached. This expression can then be evaluated by the 
use of the curves or by simple computation. For the sake of convenience, the 
case Ez[k*\ will be considered in detail. It will then be possible to write down the 
expression for general 5 and n. 

As a starting point, look upon the first three boxes as a single unit. Then 
there is a definite probability 7r» that one of these boxes will have k balls in it for 
the first time on the z*th throw into these three boxes and that the other two 
boxes of the unit will each have less than k balls. Then if one of the other of 
the three boxes has u balls (u < k) the third box will have (Xi — k — u) balls, 
(Xi — k — u < k). Meanwhile the fourth box will also have been receiving balls, 
and the number in it at this time will be denoted by j, (J = 0, 1, 2, • • • , <»). 
For each Xi there is a probability associated with u , namely P(u | Xi ), and another 
probability associated with j, P(j | Xi ). For the moment, consider that box 
1 has received k balls, box 2 the (Xi — k — u) balls, box 3 the u balls and box 4 
the j balls. This numbering is of course immaterial since the situation is sym¬ 
metric with respect to the first three boxes. 

Now if j > k, either (2k + u — Xi) balls will be required in the second box or 
(k — u) balls in the third box in order to obtain three properly occupied boxes. 
On the other hand, if j < k, the specified number will be required in any two of 
boxes two, three and four. Consequently, with this conditional description of 
the situation, the required number of balls necessary to obtain three out of the 
four boxes occupied in the proper manner is 


(4.24) Xi + j + E 2 [(2k + u - x t ) 9 (k - u) } (k - j)], 


where (k — j ) will be taken as zero if j is greater than or equal to k. From this 
description, it is evident tl^at the desired mean value may be obtained by sum¬ 
ming (4.24) over all possible values of Xi , j and u. Therefore 


(4.25) 


EM*} = E *■<{* +Spw I *<) 

i l 1-0 


(J + H P(u I Xi)Ei\{2k + u- Xi), (k - u),k - j)]^j , 


It ia to be noticed that the probabilities inside the Et[ ] in (4.24) and (4.25) 
do not add to one but only to 3/4. This can be easily remedied by the applica¬ 
tion of a formula similar to (4.6) and the result is obtained that 


E t [k A \ 

(4.26) 


2 S’.' l X i + 52 P(j I X i) 

i l i-0 

(J + 4/3 2 P(« I Xi)E t [(2k + u — x t ), ( k — u), (k 



where each probability inside E«[ ] is now 1/3. 



BOX PROBLEMS 


By simple considerations 

(Xj ■— fc)l 

27 ) pq,!^-_, 

E -t t P- mir^ u 


arm 


u u\(Xi — k — u)\ 

where u and (x» — k — u) are both less than k, and 


am* 




From (4.27) and (4.28) 
(4.29) 


E jP(j I Xi) = Xi/ 3, 


E uP(u |a;<) 


Xi — /c 
— 2 ~ 


(4.25) can be written as 

F s [^j = E *■<*< + E »< E i^O' I *<) 

* * i 

(4.31) 4 , „ 

+ o E *■< E P(j I *<) E P(m | Xi)E 2 [(2k + u- Xi), (k - u), {k - j% 

o % j u 

Finally, making use of (4.29), (4.30), the definition of x* and 7 r,- and the procedure 
of replacing random variables inside an E 2 [ ] by their mean values, 

(4.32) m *J ~ | {EAk'\ + E , [(ffc - , (l* - , (k - ^)]|, 

and this in turn can be written as 

(4.33) Ez[k*\ ~ 11 EM*} + E 2 [(ffc - (k - ^)]} • 

This method of analysis which has just been applied to E z [k 4 ] can be used 
equally well for E $ [k n ]. Here one simply considers the first (n — 1 ) boxes and 
proceeds as above. The final result is immediately apparent, namely that 

E.m^^-^Eiik'- 1 } + 

It will be noticed that in reducing (4.34) further it will be necessary to consider 
py proaginna of the form E,[ki~\ fcj]. However, it will be seen from the forgoing 



PHILIP J. MCCARTHY 


analysis that no use was made of the fact that the integers attached to the first 
(n — 1) boxes were the same. Accordingly, 


(4.35) 


E.[kr\ h] 


■ {fill 

n — 1 ( 


ikr 1 ] + 


E a -1 



ki 


-I^T(‘- 


^?)]} 


Now, by the use of (4.34) and (4.35), it is possible to reduce s as much as may be 
desired. 


5. Some considerations concerning the error of the approximations. 

5.1. Preliminary remarks. This discussion of the errors of the approximations 
given in the preceding sections has been left until now so that a broad perspec¬ 
tive might be gained, and the errors seen in relationship to one another. Such 
an arrangement is advantageous in this instance since both the analytical and 
computational results bearing on the subject are scanty, and consequently, 
any intelligent leads which their inter-relationships can give are most helpful. 

The difficulty involved in obtaining exact values for the various quantities 
considered in this paper has been pointed out quite frequently, and the approxi¬ 
mations have been devised to overcome this very difficulty. The same com¬ 
plexity which prevents the computation of many exact values also prevents any 
effective analytic approach to the problem of evaluating the errors. For these 
reasons the author has been unable to carry through any general analytic treat¬ 
ment of the errors of the approximations. However, because the intelligent use 
of approximations requires some knowledge of their accuracy, certain isolated 
cases have been investigated by a combination of computational, graphical and 
analytic methods. These investigations are detailed in the remainder of this 
section, and conjectures concerning the general behavior of the errors are made 
whenever possible. As has been stated earlier, no consideration will be given 
to the approximation formulae for the variance. 

5.2. Errors of the approximations for E\[ki(pf), • • • , k n (p n )] and 

E n [ki{pi), ■ * * , k n {pn)]- 
Taking n equal to 3, we have from (4.11) that 

I Ei\ki(jpi) f hfa), fc 8 (p«)] - E\[a(pi + jh), fca(ps)] | 

• Max | fi?LX7p, + pt), fcj(p>)] |, 

where Max | E\[X(pi + pt), A- 3 (p 3 )] | is the maximum absolute value of the 
second derivative of Ei[X(pi + P 2 ), ^(pa)] with respect to X, and a is equal to 
Ei[ki(pi/(pi + P2)), h{p 2 /{pi + pi))]. Now an examination of the curves 



BOX PROBLEMS 


365 


given in section 6 indicates that, for fixed pz and kz , the maximum curvature of 
Ei[X(pi + P 2 ), fc 8 (ps)], considered as a function of X , is a monotone decreasing 
function of kz . Since this curvature is negative, this geometric observation is 
equivalent to 


Max | E?i[X(pi + £ 2 ), (kz + 1) (ps)] | 

(5.2) 

< Max | E\[X( Pl + p 2 ), kz{p z )\ I, 

although it is not necessarily true that 

I E\[Xi(pi + Pi), (k 3 + l)(p 3 )] | < | £i[*i(pi + pa), ka(pa)] | . 

Moreover, 


(5.3) EAk^), Up*), kz(pz)] < Ei[ki(pi), fe(p 2 ), (kz + l)(p»)]. 

From (5.1), (5.2) and (5.3) one readily obtains that the absolute value of the 
percentage error of the approximation to Ei[ki(pi), k 2 (p 2 ) y kz(pz)] is bounded *by 
a function, say Ui[ki(pi), k 2 (p 2 ), kz(pz)] y which is a monotone decreasing function 
of kz as kz increases. It should be noticed that the results of 4.2 have already 
shown not only that this upper bound for the percentage error approaches zero 
as kz becomes infinite, but also that the absolute difference between the true and 
approximate values approach zero as kz becomes infinite. 

Computation of Ui[ki{pi ), k 2 (pz) y kz(pz)] is very time consuming because of the 
difficulty in obtaining Max | E\[X(pi + p 2 ) y kz(pz)] | , and because the direct 
computation of E^kxipi), fa(p 2 ), fc 3 (ps)] is laborious when any of h , kz and kz 
are much larger than 2 or 3. In order to surmount these difficulties and still 
give some indication of the behavior of Ui[ki(pi), fe(p 2 ), & 3 (p 3 )], the following 
expedients were adopted: 

1. The values of h , kz and kz were each fixed at 5, 

2 . Max | E\[X(p x + pz), kz(pz)] | was obtained by graphical means, namely 
drawing the slopes of the appropriate curve in section 6, graphing these slopes 
and then taking off the maximum slopes of these curves. 

3. Ei[ki(pi), kzipz), kz(pz)) was replaced by its approximation, 

E\[a(pi + pz), kz(pz)]i 

in the computation of the percentage error. This new bound will be denoted 
by U*[ki(pi), k 2 (p 2 ) y kz(pz)]. 

4 . Carefully chosen values of U*[ki(pi), kzipz), kz(pz)] were plotted on trian¬ 
gular coordinates, and contour lines interpolated and extrapolated to cover in 
large part the range of pi , p 2 and pz . 

The use of the third of the above listed assumptions is no detriment to the 
usefulness of the results since 


E u [\ ~ *i[] 

E la [] - EJi] _ E la [) < EMfa(Pi), hfo), fe>(p»)1 
Ei[] t Eia [] — Ei [J 100 C7?[fci(pi), ^(Pa), Hpi)} 9 
Ei .[] 



366 


PHILIP J. MCCARTHY 


where E ia [ ] = #i[a(pi + p*), A^ps)] and Ei[ ] = #i[fci(pi), hfa), & 8 (p 8 )]. 
Since U* [ ] is a monotone decrease function of fc 8 , this new bound on the per¬ 
centage error is also monotone decreasing for increasing kz . Absolute values 
were not required in this derivation since Ei a [ ] is always greater than or equal 
to Ei[ ], as is apparent from (5.1) and an examination of the curves of section 
6 . The contours of^f/?[5(pi), 5(p2), 5(ps)] are shown in Fig. 1. The interpreta¬ 
tion of this figure is very straightforward. For example, for p 8 ^ .5, the value 



* 7.9ft 5.0ft 2.0ft 

Fig. 1. Contours op U *[ 5(pi), 5(p 2 ), 5(p»)] Considered as a Function 
of pi, p 2 and p» 


of l/*[5(pi), 5(^2), 5(p 3 )] is less than 5.0%. Making use of the definition of 
U*[ ], and especially its monotone characteristic, one can then say: the ap¬ 
proximation for 2?i[5(pi), 5(p2), Aj 8 (p 3 )L where kz ^ 5, p 8 ^ .50 is in error by not 
more than 5.3%. Morec>ver, as has been already observed 2£i[a(pi + P2), 
kz(pz)] is always greater than or equal to Ei[ki(pi), & 8 (p 8 )]. 

It will be noticed from Fig. 1 that £/*[ ] is increasing steadily as p 8 approaches 
1 . It has been demonstrated by McCarthy [5] that this behavior of the upper 
bound does not mean that the percentage error itself becomes larger as p 8 ap- 



BOX PROBLEMS 


367 


proaches 1 . As a matter of fact, for fixed fa , fa and fa , the percentage error 
approaches zero as p 3 approaches 1. However, this demonstration does not 
furnish any reasonable bounds with which to fill in the lower left hand comer of 
Fig. 1. This fact is not as serious as it may at first seem because there is nothing 
to prevent one from reordering the boxes. For example, consider Fi[5(.2), 
5(.2), 5(.6)]. From Fig. 1, the error of the approximation for this quantity, 
namely Ei[Ei[5(.5)> 5(.5)](.4), 5(.6)], is not more than approximately 

7.5/(100 - 7.5) =-8.1%. 

On the other hand this same figure shows that Fi[Fi[5(.25), 5(.75)](.80), 5(.20)], 
which is also an approximation to #i[5(.2), 5(.2), 5(.6)], is in error by not more 
than approximately .8%. Consequently one would choose the second ordering. 

The procedure which has been used to obtain an upper bound on the percent¬ 
age error of the approximation to E\[fa{pi), fa(pd, faipa)], fa and fa fixed and 
fa greater than or equal to that integer at which the bound is evaluated, can also 
be applied to E 3 [fa(pf), faipd, fe(p 3 )]. All the assumptions remain the same 
and in this case the bounds corresponding to U\[ ] and Ut[ ] are denoted by 
I/ 3 [ ] and Utl ]. As in the case of Ui[ ], we have 

2£ 3 j [ j 

RII - £»[] = g»[] < gifajgO. hfo), Wl 

g*[] , , g»U - E»[) ~ 100 

g»[] 

Here the approximation, E 2 [b{pi + P 2 ), /c 3 (p 3 )], is always less than or equal to 
the exact value, E 3 [fa(pi) , fe(p 2 ), fa(p*)]. The contours of 17*[5(pi), 5(p2), 5(pa)] 
are shown in Fig. 2. In using Uf[5(pi), 5(p2), 5(p 3 )] it is sometimes advan¬ 
tageous to reorder the boxes. For example, consider 2? 3 [5(.2), 5(.2), 5(.6)]. 
Fig. 2 shows that, as an approximation, E 2 [E 2 [ 5(.5), 5(.5)](.4), 5(.6)] is in error 
by not more than approximately 9%. However, E 2 [E 2 [ 5(.25), 5(.75)](.80), 
5(.20)], which is also an approximation for F 3 [5(.2), 5(.2), 5(.6)], is in error by 
not more than about 7%. There is a gain here, but it is not as great as the cor¬ 
responding situation for 2?i[5(.2), 5(.2), 5(.6)]. 

As has already been stated, one may minimize the error by correctly choosing 
the two boxes which are to be combined first. Some discussion will be given 
here of a procedure for choosing these two boxes. Of course an experimental 
scheme may be used which makes use of the fact that the approximation to 
Ei[fa(pi), fafa), fa(pa)] is always an overestimate. In other words, that grouping 
is used which gives rise to the smallest value of the approximation . However, 
this can be replaced by a few preliminary computations. 

As can be seen from (5.1), the error of the approximation depends upon two 
quantities, namely the variance of the two box situation obtained by combining; 
two of the boxes, and the maximum value of the second derivative of the curve 
representing the function Ei[X(pi + P 2 ), A: 3 (p 3 )] over the proper range of X 
values. The error will be zero of E\[X(p\ + P 2 ), fa(p*)] is either a constant or 



368 


PHILIP J. MCCARTHY 


linear in X over the range of X values in which one is interested, that is fa < 
X < fa + fa — 1, fa ^ fa . If this is not possible, then one wishes to make it 
as near so as possible, subject to the restriction that 

<ri[fa(pi/(pi + P2)), fa{pi/(pi + Pa))] 
is not unnecessarily large. 



"* 15,056 10.056 

Fig. 2. Contours of U *[ b(p 1), 5(p 2 ), 5(p 3 )] Considered as a Function 
of pi , P2 and p j 


An indication of the relationship between the boxes for both linearity and con¬ 
tribution to variance can be obtained from expressions (3.10) and (3.11). Thus 
for each box one computes fa/pi and A\*(l — p t )/p\. Then in order to most nearly 
achieve linearity one orders the boxes in accordance with the increasing order of 
fa/pi and combines them in that order. If there is a tie between two or more 
boxes with respect to the fa/pi ordering, then one orders these “tied” boxes in 
accordance with increasing Aj»(1 — pi)/p \. 

Some computations have been carried out to illustrate these points and they 
are given in Table 1. The notation ((2, 4), 6) means that one first combines the 
boxes with integers 2 and 4, and then combines this result with the box with 


BOX PROBLEMS 


369 


associated integer 6. All values in this table were obtained by direct computa¬ 
tion. No use of the curves was made. 

In these three situations, one obtains the values given in Table 2. 

Thus in the first case there is nothing to choose with respect to ki/pi , but 
fc*(l — pi)/p\ indicates the ordering ((6, 4), 2). Actually the percentage error 
in this instance is 1.0 as compared with 1.7 and 2.4 for the other two orderings. 
In case two, ki/pi indicates the ordering ((2, 6), 4). Although this does not 
turn out to be the best ordering, Table 1 shows that the ordering in this instance 
makes little difference. In the last case, the indicated ordering is ((2, 4), 6) 
and the percentage error for this is zero, as opposed to 1.3 and 1.6. Since at 
any stage in the operation of combining boxes two at a time (4.13) holds, the 

TABLE 1 


Effect of Order of Combination on Error of Approximation 


Vi 

V. 

k x 

P* 

A 

n 

k, 

Ei[kx(px), fci(pt), fc*(ps)] 

% Error of Approximation 

Order of Combination 
((2, 4), 6) ((2, 6), 4) ((4,6), 2) 

2 

4 

6 

6.96 

+ 1.7 

+2.4 

+ 1.0 

4 

6 

2 

3.92 




6 

4 

2 

3.77 


+1.3 

+1.6 


TABLE 2 


Vi 

1/6 

1/3 

1/2 

1/6 

1/3 

1/2 

1/6 

1/3 

1/2 

ki 

2 

4 

6 

4 

6 

2 

6 

4 

2 

ki/Vi 

12 

12 

12 

24 

18 

4 

36 

12 

4 

k.{ 1 — Pi)/p< 

60 

24 

12 

120 

36 

4 

180 

24 

4 


above procedure will also give the minimum error for the approximation to 
Es[ki(pi), fe(p 2 ), WP3)]. Moreover , the approximation for this quantity is always 
an underestimate of the true value , and therefore that ordering should be taken which 
gives the greatest value for the approximation. 

When the error of the approximation to Ei[ki(pi), • • • , k n (p n )] and 

E n [ki(pi) y * * * f k n (,Pn)]i 

for n greater than three, is considered, it is immediately obvious that the general 
considerations already given in this section still apply. In addition to these 
considerations, there is the difficulty that errors may cumulate. However, the 
results already quoted for three boxes, in conjunction with those which are to 
be given in 5.3, indicate that this cumulation is not serious. There are two 
factors which eventually prevent (i.e. as more and more boxes are considered) 
this percentage error from becoming unduly large, and, in fact, make it approach 
zero. These are: 

1. The value of p 8 will, in most instances, be decreasing as more and more 
boxes are considered (see Fig. 1), and 




370 


PHILIP J. MCCARTHY 


2. The true value is usually becoming larger and larger as more and more boxes 
are considered. 

In order to minimize the error, the following precautions should be taken: 

1. At each stage in the computation, try to avoid, as much as possible, making 
readings where Ei[X(p[ + pf), A: a (ps)] is curving sharply. If all readings are 
made where the curves are nearly linear, the percentage error will be very close 
to zero. On the other hand, if many readings must be made where the slopes 
of the curves are changing most sharply, larger errors must be expected. 

2. Use that ordering of the boxes which provides the minimum value for the 
approximation to E\[ ] or the maximum value for the approximation to E n [ ]. 

8. In order to approximate the ordering which (2) would give, compute 
k t /p t and &,(1 — p t )/p\ at each stage at uhich two boxes are to be combined 
and use the rules of procedure already given for three boxes. 

5.3. Error of the approximation for E„[k n ]. Repeated applications of the re¬ 
duction formulae (4.34) and (4.35) allow one to evaluate E a [k n ] by means of the 
solution for the two box case, or more explicitly, by means of the curves given in 
section 6. Here the error of this approximation will be discussed primarily from 
a computational point of view. 

E a [l n ] can be treated in detail since it is possible to obtain exact values for this 
expression by means of (4.1). This has been done by McCarthy [5], but the 
details will not be repeated here because of lack of space. The results simply 
add more credence to the conjectures which will soon be made. 

When k is taken to be larger than one, the difficulty arises that it is almost im¬ 
possible to compute the exact value of E 8 [k n ] in a large number of cases. Con¬ 
sequently it was necessary to devise an experimental model to estimate these 
exact values so that the amount of error would be known within bounds. A 
set of 10,000 punched cards 1 was obtained on which were recorded 100,000 
random numbers drawn from a rectangular distribution. Thus if the cards are 
ordered on a particular set of columns, and one reads off the digits 0-9 on another 
specified column, one card at a time, it is equivalent to using a table of random 
numbers such as those prepared by Tippett [7]. By the use of these cards, it 
was possible to run off on an IBM Tabulator any desired number of experiments 
in order to obtain an experimental distribution from which to calculate an es¬ 
timate of E a [k n } and the variance of this estimate. For example, in determining 
an estimate of 2?i[2 5 ] one hundred experimental trials were made, as described 
above, with the following results: 


Number of Trials 
Required 

2 

3 

' 4 

5 

6 


Frequency 

23 

32 

31 

11 

3 


1 These punched cards were prepared at the Mayo Clinic, Rochester, Minn., under the 
direction of Doctor Joseph Berkson. 



BOX PROBLEMS 


371 


From this distribution the estimate of 2£i[2 5 ] is 3.39, with a variance computed 
from the distribution of .011. The 95% symmetric confidence limits for the 
mean, computed from the Student ^-distribution, are 3.17 and 3.61. Such 
estimates will be used in the remainder of this section. It should be pointed 
out that in order to prevent a prohibitive amount of machine time, it was 


TABLE 3 

Percentage Errors for E,[k n ] 


s k 

n 3 

4 

5 

1 1 

— 

_ 

_ 

2 

+ .7 

+ 2.2 

- .3 +13.6 

5 

+ 1.1 

- 3.1 +5.7 

+ .6 +10.7 

10 



- 2.9 + 5.1 

2 1 

- 5.6 

- .4 

+ 1.3 

2 

- 4.6 

- 4.4 +4.4 

+ .6 +10.4 

5 

-4.6 +1.7 

+ 3.0 +9.3 

+ 7.9 +14.8 

10 

-3.7 +2.1 

- .3 +5.5 

+ 4.3 +10.7 

15 


+ 1.0 +7.2 


20 

-2.5 +2.4 



3 1 

-18.2 

-12.7 

- 3.1 

2 

- 6.3 

-16.5 -7.3 

- 2.9 + 6.0 

5 

-9.7 -2.2 

-10.7 -5.5 

+ .8 + 5.8 

10 



- 2.1 + 3.1 

4 1 


-12.0 

-15.6 

2 


-13.6 +6.1 

-11.6 - 3.9 

5 


-13.9 -7.2 

- 9.9 - 4.0 

10 


- 8.9 -2.6 

- 6.4 - 1.2 

5 1 



-8.8 

2 



-18.1 - 6.0 

5 



-12.5 - 5.6 

10 



- 8.9 - 2.9 


necessary to use many of the same runs to determine values of E 8 [k n ] for different 
values of s, k and n. This means that the errors are correlated to some slight 
extent, but it would be extremely difficult to determine how much. 

A summary of the computed percentage errors for various values of $, k and 
n is given in Table 3. In the instances where there are two entries, they are 
calculated on the basis of the 95% confidence limits for the experimental mean. 
These confidence limits are symmetric and weie determined by using the Student 
t-distribution. For k equal to 2 and 5 the distribution contained 100 trials, 



372 


PHILIP J. MCCARTHY 


while for k greater than 5, the distribution were made up of approximately 50 
trials. 

The computations given in this table show for various values of s, k and n, 
the percentage error of the approximation for E a [k n ]. In addition to showing 
the values of these percentage errors, the computations lead one to conjecture 
that 

1. For fixed 5 and k , there exists an n 0 such that for n > n 0 the absolute value 
of the percentage error of the approximation for E 8 [k n ] is a monotone decreasing 
function for increasing n. It was shown by McCarthy [5] that this absolute 
value approaches zero as n approaches infinity for F a [l”], and in fact, that the 
difference between the true and approximate values approaches zero. 

2 . For fixed s and n, there exists a k 0 such that for A; > k 0 , the absolute value 
of the percentage error of the approximation for E a [k n ] is a monotone decreasing 
function for increasing k. 

6. Computation. 

6.1. Curves to aid in the computation of E x [k x (p x ) } fe(p 2 )]. In 3.1 it was shown 
that Ei[ki(pi), hi{pf)] is equal to 

1 I pi (hi + 1, k 2 ) + I P2 (k 2 + 1, ki), 

Vi Pi 

where I x (p , q) is the Incomplete Beta-Function as tabled by Karl Pearson [6]. 
There are three principal difficulties connected with the use of these tables as 
they apply to the approximations of this paper. These are: 

1 . The tables must be available, 

2. The tables give directly only values for integer or half-integer values of 
ki and k 2 , and 

3. Since many different values of Ei[k x (p x ), 2)] are often required to obtain 
a single approximation, the computational burden would be very heavy. 

In order to surmount these difficulties, it seemed advisable to prepare curves 
giving the values of Ei[ki(pi), k 2 (p 2 )] for various values of ki , fe , p x and p 2 . 
These curves would give values of Ei[ ] with sufficient accuracy for most prob 
lems not only for integer values of k x and , but for all values over the range 
considered. 

Such curves have been prepared by computing E x [ki(p x ) y £2(3)2)] for integral 
values of k x and fo (for fixed p x and pj and then joining these points with a 
smooth curve. A summary of the graphs prepared is as follows: 




*1 

k, 


Vi 

pi 

Fig. 3 

1,2, • 

• • , 25, « 

1,2, ••• 

,35 

.50 

.50 

Fig. 4 

1,2, • 

• • , 20, 00 

1, 2, • • • 

,35 

.40 

.60 

Fig. 5 

1,2, ■ 

• • , -15, 00 

1, 2, ••• 

,35 

.20 

.80 

Fig. 6 

1,2, • 

• • , 10, 00 

1, 2, • • • 

,15 

.80 

.20 

Fig. 7 

1,2, ■ 

• •, 7, * 

1,2, ••• 

,15 

.60 

.40 

Fig. 8 

1,2, ■ 

• • , 8, « 

1,2, ••• 

,15 

.50 

.50 

Fig. 9 

1,2, ■ 

• * , 6, » 

1,2, ••• 

,15 

.40 

.60 

fig. 10 

1, 2,- 

■ • , 5, 00 

1,2, ••• 

,15 

.20 

.80 







BOX PROBLEMS 


373 



Figures 8, 9, and 10 are simply portions of figures 3, 4 and 5 drawn on an ex¬ 
panded scale in order to permit greater accuracy in reading the curves. Also 
figures 6 and 10 and figures 7 and 9 form pairs in that a member of one pair can 










374 


PHILIP J. MCCARTHY 




cmoiMi 
aaRBi 


:|k,(. 40 ), K,<. 60 )| 




■88 

KRJSKi 


BBBBBBBBBBBBBBBBBBBBRBBB^iQISaR&BBBRBl 

BRBBBBBBBBBBBBBBBBBBBflK^%RBRSaB[X9BBl 




■BBBBBBBBBBBBBBBBBBBB^iQlBBBBMBaBai 
BBBBBBBBBBBBBBBBBBBB^KQUEBBBSaBBBBBBl 

BBBBBBBBBBBBBBBBBBB'ISBBlieaiBBBSSaHBBBBl 
iBBBBBBBBBBBBBBBB^jggfBBgaBBBBBB BBBBl 
BBBBBBBBBBBBBBBBBR^BSIBaiBBBBBSBSBBBB 



1111 


I^ZBBBBaflBflB 

BBBBBBBBBBBBBBK^ilEBBiiBBBBBaal 
BBBBBBBBBBBBBB^iQRaRSaBBBBBBBBBBnSBB| 

BBBBBBBBHBBRBr/^ 5 ^agR|BB 9 a 9 BaSBSSaSSl 

BBBBBBBBBBBBR^aBBpesSaBBBiBBBBRe*B*l 

bbbbbbbbbbbb^^SbsSbbbbbbbbbbbEbSjbbI 


I bBBBBBBBBBBBBBBI ffllBBl 

8||aggsi|iBasm 

BBBBBBR^BBBaBBBBBBBBBBBBBBBBBBBBBBil 
BgBBB Br/4gaBBBBBBBBBBBBBBBBBBBBB[BJBB| 
—RggHBBBSaBBBBBBBBBBBBBBBBBBBBBBl 
^■i^agaaBBBRBBBBBBBBRBBBBBBBflrClBBl 

^RaRBSaBSSSSSSSSSSSSSSSSSSSSSSSl 


bbbr^^sbbbbBbbbbbbbbbI 

BBB^gBBBBBBBBBBBBBBBBBBBBBBBBBBniBBl 

BBRgBBSSaaBBBBBBBBBBBBBBBBBBBBBBBBBG 

b b^bbbbbbbbbbbbbbbbbbBbbbbbbbbb5bbb I 

SieaBflBBiii&iiiiiinSiiiiiiiiiiiiiiiil 


be obtained from the other member of the pair. Both members of the pair are 
given on the expanded scale in order to facilitate interpolation. Values of the 
mean for combinations of fa and fa not given directly can usually be obtained 








BOX PROBLEMS 


375 



with sufficient accuracy with linear interpolation. Interpolation for pi and pt 
should be done graphically since in some instances linear interpolation would be 
extremely poor. 

















376 


PHILIP J. MCCARTHY 


As an example, suppose one has two boxes with fa = 2, fo = 5, pi = .40 and 
p 2 = -60. Consulting Fig. 9, one goes along the horizontal axis to fa — 5. 



0 I 2 3 4 5 6 7 Q 9 10 II 12 13 14 


Following up the vertical line through this point to the curve fa = 2, Fi[2(.40), 
5(.60)] is read as 4.25. The actually computed value to four decimals is 4.2224. 













































BOX PROBLEMS 


377 



It is immediately eudent that #2[Mpi)> k 2 (p 2)] can also be obtained from the 
curves since 

EiMpi), fe(pi)] = (ki/pi) + (k 2 /p2) - ftfcW, fe(p 2 )]. 

6.2. t/se of the curves to obtain exact values ( i.e . subject only to the error of reading 
the curves) for 2£i[Wpi), fc 2 (p>), Wps)]. Referring back to (4.8), one obtains 
that 
( 6 . 1 ) 


Ei[ki(pd, k 2 (p 2 ), Wps)] = 53 *%Ei[x x (pi + p 2 ), fcj(p«)], 





















































378 


PHILIP J. MCCARTHY 



0 1 2 3 4 5 6 7 8 9 10 II 12 13 14 


where v * is the probability that either k\ balls are obtained in the first box or k 2 
balls are obtained in the second box on the Xi th throw for the first time, assuming 
balls can go only in boxes one and two. £*• takes on values 

h , k\ + 1 , • • • f h + ki — 1 

when ki < fa . Now 7r t can be easily computed and Ei[xi(pi + P2), hips)] 
can be obtained from the curves. The only difficulty in using this procedure 
















































BOX PROBLEMS 


379 



arises when the range of x t is large. Then a large amount of computation is 
involved. 

In order to illustrate this computation, consider Ei[2(.l), 3(.l), 5(.8)]. Here 
x x takes on the values 2, 3 and 4. We have x\ = 2, tti = 2/8; x* = 3, ir* 3/8; 
and x z = 4, ir 8 = 3/8. From Fig. 6 

Eil 2(.2), 5(.8)] = 5.09 
£i[3(.2), 5(.8)] = 5.88 
£i[4(.2), 5(.8)] = 6.11. 


















































380 


PHILIP J. MCCARTHY 



0 I 2 3 4 5 6 7 8 9 10 Jl 12 13 14 

Consequently, 2£i[2(.l), 3(.l), 5(.8)] is equal to 


(5.09) (2/8) + (5.88) (3/8) + (6.11)(3/8) = 5.77. 

Using computed values for Ei[xi(.2), 5(.8)J, Ei[2(.l), 3(.l), 5(.8)] is equal to 
5.75. Thus the use of the curves has only led to an error of .3%. 
















































































BOX PROBLEMS 


381 


6.3. Use of the curves in approximating Ei[ki(pi), • • • , k n (p n )], 

E*[ki(p i), • • • , k n (pn)] 

and E,[k n ]. In illustrating the application of the curves and the reduction 
formulae (4.34) and (4.35), one example will be worked through in detail. This 
example will provide illustrations of all the details involved in such problems. 
Consider 2£ 4 [5 6 ]. Applying formula (4.34) 

(6.2) E t [5*\ =* 5/4 |e,[5 4 ] + E t [Q- 5 - , ^5 - ^^)j| . 

Consequently, the first step must be to compute 2£i[5 4 ]. Using the principles of 
4.4 

(6.3) tfi[5 4 ] E 1 [E 1 [5%50) ) 5(.25), 5(.25)J. 

From Fig. 8, I?i[5 2 ] = 7.55. Therefore 2?i[5 4 ] is approximately equal to 
tfi[7.55(.50), 5(.25), 5(.25)]. 

Now applying the same principle again, 

(6.4) E,[ 5 4 ] c* ^[^[7.55(1), 5(*)](.75), 5(.25)]. 

By the use of figures 7, 8, 9 and 10, graphical interpolation may be applied to 
find that E x [ 7.55(3), 5($)] is equal to 9.84. The approximation procedure now 
says that 

(6.5) £i[5 4 ] c* £ r 1 [9.84(.75), 5(.25)]. 


Again applying the curves and using graphical interpolation for p\ and p* , 
Ei{5*] c* 11.88. 


Substituting this value in (6.2), 

(6.6) E*[ 5 5 ] - f {11.88 + Es[ 2.71, 2.71, 2.71, 2.03]}. 

Now formula (4.35) must be applied to # 3 [2.71, 2.71, 2.71, 2.03], i.e. 
^[2.71,2.71,2.71, 2 r 3] ^ 


(6.7) 


4 |iJ l [(2.71) 3 ] + E ,[(j 2.71 - , ^2.03 - . 


Ei[(2.71) 3 ] can be evaluated by the same method used for I?i[5 4 ]. This leads to 
the result 


(6.8) E s [2.7l. 2.71, 2.71, 2.03] e* $ {4.40 + E 2 [1M, 1.86, .56]}. 

Once more applying (4.35) 

E 2 [l.86, 1.86, .56] c* 

||®it(1.86)*] + Ei[( 2-1.86 - &[(1.86)1), (.56 - 


(6.9) 



382 


PHILIP J. MCCARTHY 


#i[1.86, 1.86] is equal, by the curves, to 2.26. Therefore 

(6.10) #*[1.86, 1.86, .66] c* \ (2.26 + #i[1.47, - .66]}. 

However, since the convention is observed that a negative quantity is replaced 
by zero, 

(6.11) #i[1.47, - .56] = #i[1.47, 0] = 0. 

Now working back through these various expressions, 

(6.12) # 4 [5 6 ] c* i [11.88 + * [4.40 + ) [2.25 + 0]]] = 27.81. 

From Table 2 it can be seen that the percentage errors for this approximation 
to # 4 [5 5 ], corresponding to the 95% confidence limits for this quantity, are —4.0% 
and —9.9%. 

This example has illustrated most of the situations which will arise in the use 
of the approximations of this paper. 

6.4. Miscellaneous approximation formulae useful for computation . There 
exists a relatively simple approximation to #i[&i(pi), k 2 (p 2 )], Pi + P 2 = 1, when 
p* is near one. Using (3.8) and making some obvious simplifications, one ob¬ 
tains 

EAkM, hw] =i 2 +* 7T fe n |,, fe)! T ^ n ~ ty-\t - Pl ) dt. 

p 2 p 2 (k 1 — 1 )!(«■* — 1)! Pi Jo 

Since pi is near zero, (1 — t) can be replaced by one, and the result is obtained 
that 


EAkiipi), k,(p2)\ 

P 2 


1 k l (hi + h 2 ) I 

P2 Pl (h + mh-iyr 


An approximation to the Incomplete Beta-Function, given by Tukey and 
Scheflfe [8], may also prove useful at times. The expression, changed slightly 
by those authors since publication, is 


Ib(n — r + 1, r) — 1 


1 

2r(r) 



d X \ 


where 


2 

Xa 



_ b) n±i _ 1 

r 

Vb , 


+ 2r. 


The right hand side of the first expression will be recognized as the x distribu¬ 
tion with 2r degrees of freedom. In the event that the tables of x are not ade¬ 
quate for the application of these expressions, the approximation of Wilson and 
Hilferty [10] should be used. This approximation states that (x/v)* where 
v is the number of degrees of freedom, is approximately normally distributed 
with mean 1 — 2/(9 v) and variance 2/(9v), for large v. 



BOX PROBLEMS 


383 


7. Acknowledgements. The author wishes to express his grateful apprecia¬ 
tion for the many helpful comments and suggestions received from Professors 
W. G. Cochran, A. M. Mood, J. W. Tukey and S. S. Wilks. 

REFERENCES 

[1] 11. A. Fisher and F. Yates, Statistical Tables for Biological , Agricultural and Medical 

Research , Oliver and Boyd, London, 1943, Table XXII. 

[2] M. A. Girshick, Frederick Mosteller, and L. J. Savage, ‘‘Unbiased estimates for 

certain binomial sampling problems with applications”, Annals of Math. Stat. f 
Vol. 17 (1946), pp. 13-23. 

[3] J. B. S. Haldane, “On a method of estimating frequencies”, Biometrika , Vol. 33 

(1945), pp. 222-225. 

[4] P. S. Laplace, Thlorie Analytiquc des Probability , Mme. V° Courcier, Paris, 1820, 

pp. 194-219. 

[5] P. J. McCarthy, Approximate Solutions for Means and Variances in a Certain Class of 

Box Problems , unpublished thesis, Library, Princeton University, 1946. 

[6] Karl Pearson, Tables of the Incomplete Beta-Function , The “Biometrika” Office, 

London, 1934. 

[7] L. H. C. Tippett, Random Sampling Numbers , Cambridge Univ. Press, 1927. 

[8] J. W. Tukey, and H. Scheff£, “A formula for sample sizes for population tolerance 

limits”, Annals of Math. Stat., Vol. 15 (1944), p. 217. 

[9] J. V. Uspensky, Introduction to Mathematical Probability , McGraw’-Hill, 1937, p. 181. 

[10] E. B. Wilson and M. M. Hilferty, “The distribution of chi-square”, Proc. Nat. 

Acad. Sci., Vol. 17 (1931), pp. 684-688. 



THE DISTRIBUTION OF THE RANGE 1 

By E. J. Gumbel 
Brooklyn College , N. F. 

1. Summary. The asymptotic distribution of the range w for a large sample 
taken from an initial unlimited distribution possessing all moments is obtained 
by the convolution of the asymptotic distribution of the two extremes. Let a 
and u be the parameters of the distribution of the extremes for a symmetrical 
variate, and let R = a(w—2u) be the reduced range. Then its asymptotic 
probability ¥( R ) and its asymptotic distribution yf/(R) may be expressed by the 
Hankel function of order one and zero. A table is given in the text. 

The asymptotic distribution g(w) of the range proper is obtained from y//(R) 
by the usual linear transformation. The initial distribution and the sample 
size influence the position and the shape of the distribution of the range in the 
same way as they influence the distribution of the largest value. If we take the 
parameters from the calculated means and standard deviations, the asymptotic 
distribution of the range gives a good fit to the calculated distributions for normal 
samples from size 6 onward. Consequently the distribution of the range for 
normal samples of any size larger than 6 may be obtained from the asymptotic 
distribution of the reduced range. 

The asymptotic probabilities and the asymptotic distributions of the rath 
range and of the range for asymmetrical distributions are obtained by the same 
method and lead to integrals which may be evaluated by numerical methods. 

2. Introduction. For any initial distribution, and any sample size n, the dis¬ 
tribution of the range may easily be written down in the form of an integral. 
However, for many given initial distributions the integration can be carried out— 
if at all—only for very small sample sizes, say n = 2 or n — 3. For larger 
samples, complicated numerical calculations have to be made, and there is no 
way of obtaining the distribution for n -f 1 observations from the distribution 
for n observations. 

Our object is to obtain the asymptotic distribution of the range. Nothing is 
supposed to be known about the initial distribution, except that it is of the ex¬ 
ponential type [9] which assures that it is unlimited in both directions, and pos¬ 
sesses all moments. It will be shown that this condition is sufficient for the 
existence of an asymptotic distribution of the range. 

With increasing samples sizes the distribution of the range may approach its 
asymptotic form in a quick*, or in a slow way. This behavior depends upon the 
nature of the initial distribution. Two examples for this approach will be 
shown. 


1 Research done with the support of a grant from the Social Science Research Council. 

384 



DISTRIBUTION OF RANGE 


385 


3. The exact distribution of the range. Let <p(x) be any initial distribu¬ 
tion, 4>(.r) the probability of a value equal to, or less than, x. Then, for samples 
of size n, the joint distribution fo n (xi, x n ) of the smallest value X\ and the largest 
value x n is 


(1) ton(3i , Xn) = n(n — l)<p(xi)($(x n ) — $(xi)) n V(«n). 

The distribution g n (w n ) of the range w n defined by 

(2) x n = Xi + Wn 

is obtained by integrating over all values Xi ^ z n whence 

(3) (Jniu'n) = n(n - 1) f ($(z + w n ) - $(z)) n ~ 2 <p(x + w n )<p(x) dz t 

J— 00 


where the index 1 has been dropped. The probability G n (w n ) for the range to 
be equal to, or less than, w n is obtained by integration of (3), whence, by re¬ 
versing the order of integration, 

/ +* r w n 

/ (n — 1) ($(a: + w„) — 4>(a;)) n ' 2 d$(x + io„) d$(x), 

oo JO 


or, after integration, 

G„(w„) = n f ($(x + w„) - <Hx)r 1 d<t>, 

a formula to which Prof. H. Hotelling has drawn my attention. The beauty of 
this formula is completely marred by the facts that, in general, we cannot express 
$(z + w n ) by $(:r), and that the numerical integration is lengthy and tiresome. 

The problem of the range for the normal distribution was first raised twenty 
five years ago by L. von Bortkiewicz [1,2]. For n — 2 and n = 3 the distribu¬ 
tion of the normal range may be written down explicitly [12, 13]. For larger 
noimal samples up to n — 20, E. S. Pearson [1G] and H. O. Hartley [10] have 
calculated numerical tables of the probability of the range. L. H. C. Tippett 
[20] has calculated the mean, the standard deviation, and the moment quotients 
for the range of the normal distribution up to n = 1000. He gave formulae for 
the moments in the form of integrals. Finally “Student” [18] reproduced the 
distribution of the range for small samples, n = 2, 3, 4, 5, G, 10, by Pearson's 
type I, and gave a formula for large samples n = 20, GO, based on Pearson's type 
VI, a procedure which is purely empirical and, therefore, unsatisfactory for 
theoretical purposes. A good resume of the present knowledge about the 
range is given in Karl Pearson's Tables [17]. 

All these studies are confined to the normal distribution and allow no conclu¬ 
sion about the asymptotic distribution of the range. According to Kendall [11] 
it is not known whether such forms exist and what they are. This question may 
at once be answered for a special case. If the distribution is limited to the left 
(or to the right), the asymptotic distribution of the range is equal to the asymp- 



386 


E. J. GUMBEL 


totic distribution of the largest (smallest) value. The asymptotic distribution 
of the range exists provided that an asymptotic distribution of the largest 
(smallest) value exists. For the exponential distribution, and for initial dis¬ 
tributions of the Pareto type, for example, the asymptotic distribution of the 
range is equal to the asymptotic distribution of the largest value. The asymp¬ 
totic distribution of the range for the rectangular distribution has been derived 
by A. G. Carlton [3]. 

4. The asymptotic distribution of the reduced range for a symmetrical 
variate. Instead of the procedures mentioned in the last paragraph, let us 
consider a large sample. It is generally assumed that the smallest and the 
largest values are independent in that case. L. H. C. Tippett [20] has shown 
that the correlation between the extremes is negligible for the normal distribution 
and for sample sizes n ^ 200. In a previous note [9] it has been shown that 
independence holds for large samples and for initial distributions of the ex¬ 
ponential type unlimited in both directions and possessing all moments. Then 
the joint distribution (1) splits into the product of the asymptotic distribution 
fi(xi) of the smallest value X\ and the asymptotic distribution f n (x n ) of the largest 
value x n 

( 4 ) t0(xi , Xn) = fliXi) -fn(Xn). 

If, furthermore, the initial distribution is symmetrical about zero, the two 
asymptotic distributions are 

(5) /ifo) = a exp[a(zi + u) - e a(xi + u) ]; f n (x n ) = <*exp[ - a(x n - u) - e~ aiXn ~ u) ]. 

These asymptotic distributions and the corresponding probabilities are traced, 
in a reduced scale, on Graphs (1) and (2). 

Since the two parameters u and a will exist also in the asymptotic distribution 
of the range, their nature must briefly be explained. The value u is defined as 
the solution of 

(6) <*>(«) = 1 - -. 

n 

Since 

(6') n( 1 - $(u)) = 1, 

the largest value u may be called the expected largest value. It differs, of course, 
from the mean of the largest value. It has been shown [6] that u increases as 
a function of the logarithm of n, the function depending upon the initial dis¬ 
tribution. 

Criteria for the approach of the distribution of the largest value toward its 
asymptotic form have been given by R. A. Fisher and L. H. C. Tippett [4]. 



DISfTRIBUTION OF BANOS 


387 


For our purpose it is sufficient to consider whether n is so large that u is very 
near to the most probable largest value x n obtained from 


( 7 ) 

If 


n — 1 
*(*») 




< p'(2n) 

<t>(x n ) 


Xn : 


holds with sufficient approximation, 2u may be interpreted as the range of the 
modes for an initial symmetrical distribution. 

The parameter a defined by 


( 8 ) 


_ <p(m) 

a ~ 1 — $(w) 


also is a function of n. Three cases have to be distinguished: In the first case, a 
is a constant, or converges with n toward a constant different from zero. In the 
second (and third) case, a increases with n without limit (decreases with n 
toward zero). The three cases correspond to three classes of initial distributions 
of the exponential type. The function a is related to the asymptotic standard 
error of the largest, and of the smallest value by 


(9) 


2 2 2 2 TT 

a On = OL <71 = — 
0 


If a increases (decreases) with n , or is independent of n, the standard error of 
the largest value decreases (increases) with the sample size, or is independent 
of it. This behavior has nothing to do with the fact that the standard error of 
the mean decreases, of course, with an increasing number of samples. 

The determination of the constants u and a from equations (6), (7), (8) is 
based on the knowledge of the initial distribution and the sample size n from 
which we take the largest observation. This method cannot be used in many 
practical applications: I) It may happen that the initial distribution, or the 
parameters it contains, are unknown. Therefore the parameters of the largest 
value cannot be obtained from it. 2) The initial distribution might be known, 
but the number of observations is insufficient to warrant this procedure, because 
the most probable largest value x n differs from the expected value u. In these 
cases the parameters u and a have to be estimated from the observed distribution 
of the largest value alone. A similar procedure will be used for the range in 
paragraph 7. 

From (4) and (5) the joint asymptotic distribution w) of the smallest 
value xi and the range w becomes 

tofo, to) - ot'expf—«(w - 2u) - e «<*rHrt _ 

The asymptotic distribution g(w) of the range alone is, dropping the index 1, 

(4') g(w) = a t e~ a(u ~ tui [^expl-e*™ -e~ a(rhr ^ ) ] dx. 



3 88 


E. J. GUMBEL 


This distribution contains the two parameters a and u existing in the asymptotic 
distribution of the largest value. To eliminate the two parameters, a reduced 
range R is introduced by 

(10) R = a(w — 2 w). 

The range w is a positive variate unlimited toward the right. The reduced 
range R is also unlimited toward the right yet limited toward the left by 

(10') R £ -2au. 

The reduced range is not related to one of the averages of the range. It is the 
range minus the range of the modes divided by a factor which is proportional to 
the standard error of the extreme value. The distribution \p(R) of the reduced 
range R y and the distribution g(w ) of the range w are related by 

(11) 'P(R) - - sf(w), 

a 

subject to restriction (10'), whereas the probability V(R) of the reduced range to 
- be equal to, or less than R is equal to the corresponding expression G(w) for the 
range proper 

( 11 ') *(R) = G(w). 

For the integration in (4') we put 

a(x 4 - w — u) = —y 


whence, from (10), 


a(x + u) — —y — R. 

The asymptotic distribution of the reduced range becomes 

(12) = c~ R [ expf— -e~ v ~ R l dy 
and the asymptotic probability l J f(R) of the range is 

(13) 'J'(R) = [ exp[y — e v — e~ v ~ R ] dy 

J—ot) 


an expression which may easily be verified by differentiation. 

The asymptotic formulas (12) and (13) hold for any initial symmetrical dis¬ 
tribution of the exponential type, for example, for the normal and the logistic 
distribution (see par. 7). The mean reduced range R and the higher moments 
of the reduced range are easily obtained from the mean W, the variance <j\ , and 



DISTRIBUTION OP RANGE 


389 


the invariants X, of order v of the range proper w given in a previous paper [8]. 


They are 



(14) 

w = 2u + ~ ; o-Jp 

2 

7T 


a 

~ 3 a 2 

(15) 

x 2(v — 1)! 1 

' a’ j £i k’ ’ 

v £ 2 


where 7 stands for Euler’s constant. 

Consequently the mean #, the variance <j\ and the invariants X, of the reduced 
range are 

(16) R = 27; <7% = 4', X„ = 2(v - 1) I £ 2,; V g 2 

O Jfc-1 AT 

Equation (14) leads to an interpretation of the reduction (10) which may be 
written 


R = a(w — w) + 2y 


or 

(14') 


R = 


V 3 G vo 


I 


Thus the transformation (10) is a linear function of the standard transformation 
(w — w)/a w usual in statistics. 


5. The probability of the range as a Bessel function. The integrals 
(12) and (13) may be evaluated by numerical procedures, since tables of the 
function exp(— e~ v ) are easily calculated. However, it turned out to be simpler 
to relate these integrals to the solution of a differential equation. The deriv¬ 
ative t'(R) of the distribution (12) is 

xf,'(R) = — \I/(R) + e~ R f exp [— 2 / — R — c v — 


The integral is equal to the probability ^(R) since the transformation 

y + R = 


leads to 



e *“*] dy = [ exp [z — e ‘ R — e § ] dz 

J—* 


Consequently the probability ty(R) is subject to the differential equation 

(17) ¥" + ¥'- e~*¥ = 0. 



390 


E. J. GUMBEL 


The mode of the reduced range is a fixed valueJSjmch that 

(18) t(&) = 

Mr. W. Wasow (Swarthmore College) has drawn my attention to the fact that 
the probability ^f(R) of the range can be expressed in terms of a Bessel function. 2 
To obtain this simplification of the differential equation we introduce a new 
positive variable z by 

(19) 2 = 2e"* /a 
and a new function U by 

(20) ¥ = U-z. 

The boundary conditions are 

(21) z =« 0, ¥ = 1 ; z = oo ; ¥ = 0. 

The first derivative becomes, from (19) 

# _ _ z # 

, dR 2 dz 

whence, from (20) 



The second derivative becomes, by the same procedure 

cfifr = _ 2 d ( zU _ z_ d_U\ 
dR 2 2d2\ 2 2 dz)' 

The second member may be written 

2\2 2 dz + 2 dz*) 4 + 4 dz + 4 d2 2 * 

Thus the differential equation (17) is now 

Jd?U &dJJ _ 2 2 dC/ 217 _ zU _ 2 3 7 = 

4 dz 2 + 4 dz 2 dz + 4 2 4 U 

Multiplication by 42 -1 leads to 

(210 ^ - v + l ) O -0. 

This is one of the classical Bessel differential equations of order 1. In the nota¬ 
tion used by the British Tables [14] (pp. 264 and 213) one of the solutions is 

(22) U(z) - K&), 


* 1 profit of this occasion to thank him for this and other valuable suggestions. 



DISTRIBUTION OF RANGE 


391 


where K%(z) y the modified Bessel function of the second kind ( Hankelf unction) 
is defined by 


(23) 


Ki(s) = (7 - lg 2 +.lg z) 


0° 1 
y— t — 
VvKv + 1 )! 




E 

1 


1 

O' — 1)! v ! 



+ 




The relation between the functions K p (z) and the Hankelfunction H^(z) is 


(23a) K r (z) - ? i’ +1 H ( "(iz). 

The asymptotic probability for the range is, from (20) and (22), 

(24) *(R) = zKi(z) 
or, from (19) 

(25) MR) = 2e~ m Ki(2e~ Rl1 ). 

This is the only Bessel function satisfying the boundary conditions (21). The 
asymptotic probability ^(R) of the range may be written finally from (25), (23) 
and (10) 

(25a) 1 - MR) = E ?! ( r - 27 + 2 S, - -) 

1 v\ (y — 1)1 \ V / 

where 

-So =0; -S, = Ej 

X-l A 

The distribution 

, , d&(R) dz 

m - IT '31 

of the reduced range R is, from (24) and (19) 

MR) = + zK[(z)). 

Now, the derivative K[(z) is linked to the modified Bessel function K Q (z) of 
the second kind and of order zero by 

zK[(z) = -K&) - zKo(z). 

Consequently the distribution is 

(26) MR) - K«(z) 



392 


E. J. GUMBEL 


or, from (19), 

(27) 4>{R) = 2e~ s Ko(2e~ R ,2 ) 
where the function K 0 (z ) is defined by 

(28) K„(z) = - (7 - lg 2 + lg 2 ) jb 

+ ¥©\-ra( 

Finally the asymptotic distribution ^(ft) of the reduced range may be written 
from (27) and (28) 


1+ 2 + 


0 - 


(28a) 


HR) = Z ° Xp - ‘f y 1 (« - 27 + 2&) 

0 v! v! 


We first investigate the analytic behavior and the order of magnitude of the 
probability &(R) and the distribution \p(R) for large negative, and large positive 
values of the reduced range, i.e. for large and small values of the positive variable 
z. If z is so large that 

(3fl/2) 

(29) s 3 = — « 1 


the expressions for Ki(z) and K 0 (z) become [14], p. 271, 

Kl ® = ^2z ° ’ 0 + 82 _ 128?) 

K °® = ]/^2z 6 ( X _ 82 + 128?) 

The probability ^( R ) becomes, from (24) and (19), 

(25') *(«) = V? exp[-f - 2e- (s '«] (l +1 e*' 2 - e B ). 

The condition (29) holds, say, for R = — 4. The numerical calculation leads, 
for 4), to the order of magnitude 10 “ 6 . 

In the same way, the distribution \//(R) becomes, from (26) and (19), for large 
negative reduced ranges 

(27') m - Vi «*) • 

This expression cannot be ‘obtained from (25') since the approximations for 
Kq(z) and K\(z) used do not fulfill the relations between the derivatives given 
above. The order of magnitude of ^(—4) is 10 “ 6 . 

Thus the probability ty(R) and the distribution \fr(R) may be neglected for 
R g —4. This removes the importance of the lower limit R —2cm stated 



DISTRIBUTION OF RANGE 


393 


in (10'). If au ^ 2, the distribution of the range may be dealt with as if it 
were practically unlimited toward the left. 

For large positive reduced ranges to which correspond small values of z , say 

(290 z* = 8e~ ml2) « 1 

the Bessel functions Ki(z) and K 0 (z) become, from (23) and (28) 

(23-) *,<*) - (, + Ig l) (| + fj) + ]-(| + I') 

(28') KM - “(v + ls|) (' + j) + j + • 

In this case we are interested to know how far the probability ^(R) differs from 
unity. Consequently we calculate 1 — ¥( R ) and obtain, from (24) and (23') 

i - *ot) = -^[(7 + tel) (* +|) - 2 - i]- 

The right side becomes, from (19) 

= C-* [(R - 2y) (l + ^ + 1 + 

or 

c~*[r -27 + 1 -ty + |)]= - 27 + i)(i +y) + 3 £ 4 - • 

If R is so large that 

«1 


we simply have 

(25") 1 - *(R) = e~\R ~ 2 7 + 1) . 

For example, for R = 10, the preceding condition is satisfied and 1 — ^{R) is 
of the order 5.10~ 4 . 

In the same manner we calculate the density of probability \!/{R) for large 
reduced ranges. From (26), (19) and (28') we obtain 

m = 2e- fi [g - 7 ) (1 + e ~ B ) + e- 8 + |e-*«]. 

By neglecting e~ 2K « R, the right side becomes 

e“ 8 [(« - 27 ) (1 + e -8 ) + 2e -8 ] = e- 8 [« - 27 + e~\R -27 + 2)] 
whence 


m = e~\R - 27)(1 + e~*) + 2e"‘ 8 . 



394 


E« Ji QUMBEL 


In first approximation we obtain 

(27") *(«) = e~\R - 2y) 


a formula which may also be derived directly from (25"). The density of 
probability is of the order 1CT 4 for R = 10. 

From the formulae (25') and (27') valid for large negative values of R , and 
from the formulae (25") and (27") valid for large positive values of R follow the 
boundary conditions 


lim 

R—00 


t(R) 

*(«) 


e“ ( * /2) 


jKg) _ R-2y 
1 - B - 2y + 1 


For the construction of tables of the distribution \(/{R) and the probability 
>£( R) of the reduced range it is sufficient to consider the interval 


-3 < R < 10. 


The two functions Ki(z) and K Q (z) have been tabulated [14] and [19]. Hence 
the probability and the distribution could be calculated from such tables of the 
Bessel functions. This procedure, however, was only used to obtain boundary 
values. The tables I and la arc based on computations made in the Calculation 
and Ballistics Department at the Naval Proving Ground Dahlgren by stepwise 
integration of the differential equation (17) using the special Relay Calculator 
of the International Business Machines Corporation. 3 

Table I gives the probability ^(B) (col. 2) and the distribution i//(R) (col. 4) 
for the reduced ranges — 3 ^ R S 10.5 in intervals A R = 0.5. The differences 
A^ given in col. 3 are taken from the original figures. 

For different uses it is necessary to know the reduced range as a function of 
its probability. This relation is shown in Table la. The first column gives the 
probability, the first line gives the last decimal of this probability, and the cells 
give the reduced range corresponding to the probability obtained from the 
combination of the first column and the first line. For example: The reduced 
range R = —3.20 corresponds to the probability ^{R) = 0.0002, and the reduced 
range R = 10.44 corresponds to the probability >k(ft) = 0.9997. 

This table may be used for obtaining the percentage points of the reduced 
range. The mode R, the median R calculated by the Naval Proving Ground 
and the mean R obtained from (14) and (10) are 

(30) R = 0.506366440; R = 0.928597642; R = 1.154431330. 

A probability paper for the range may be constructed in the following way: The 
observed ranges w are plotted on the vertical axis; the reduced ranges R on a 
horizontal axis. The abscissa shows the probabilities 

V(R) = G{w) 

1 The author wishes to express his sincere appreciation for the permission to use these 
computations. The original tables give the probability and the distribution to 8 significant 
decimal places at intervals &R — 1/100. Lack of space prevents the reproduction of these 
tables. 



DISTRIBUTION OF RANGE 


39 & 


TABLE I 


Asymptotic Probability and Asymptotic Distribution of the Reduced Range 


1 

2 

3 

4 

Reduced Range 

Probability 

Difference 

Distribution 

R 

* (R) 


*(R) 

-3.0 

.00050 

.00274 

.00212 

-2.5 

.00324 

.01032 

.01057 

-2.0 

.01356 

.02693 

.03386 

-1.5 

.04048 

.05251 

.07705 

-1.0 

.09299 

.08141 

.13419 

- .5 

.17440 

.10533 

.18969 

.0 

.27973 

.11821 

.22779 

.5 

.39794 

.11859 

.24075 

1.0 

.51654 

.10891 

.23021 

1.5 

.62545 

.09327 

.20346 

2.0 

.71872 

.07557 

.16898 

2.5 

.79429 

.05860 

.13360 

3.0 

.85289 

.04386 

.10157 

3.5 

.89675 

.03192 

.07483 

4.0 

.92867 

.02270 

.05375 

4.5 

.95136 

.01584 

.03783 

5.0 

.96721 

.01089 

.02618 

5.5 

.97810 

.00739 

.01787 

6.0 

.98549 

.00496 

•01205 







E. J. GUMBEL 


TABLE I—Concluded 


1 

2 

3 

4 

Reduced Range 

R 

Probability 

*(R) 

Difference 

A<ff 

Distribution 

* (R) 

6.5 

.99045 

.00330 

.00805 

7.0 

.99375 

.00218 

.00534 

7.5 

.99594 

.00143 

.00351 

8.0 

.99737 

.00093 

.00230 

8.5 

.99830 

.00061 

.00150 

9.0 

.99891 

.00039 

.00097 

9.5 

.99930 

.00025 

.00062 

10.0 

.99955 

.00016 

.00040 

10.5 

.99972 


.00026 


corresponding to the reduced ranges R. If the observations follow the theory, 
the observed ranges are scattered around the straight line 

(10') w = 2u + - 


If the samples are drawn simultaneously, and if there is a constant interval of 
time between the drawings, this interval may be used as unit of time for the 
construction of the return periods T(R ) and iT(R) of a range equal to, or larger 
than (smaller than) R where 


T(R) = 


1 

1 - V(R) 


; i T(R) 


1 

*(R) m 


The first (second) notion applies to the range above (below) the median. The 
return periods are shown in an upper parallel to the abscissa. 

A scheme for this paper is given in Fig. 3. Such a paper will allow a graphical 
test for the fit of the observed ranges to our theory, and avoids any numerical 
calculations. Obviously this method may only be used if the initial distribution 
is symmetrical, unlimited, and of the exponential type, and if the sample size 
is so large that the asymptotic distribution holds. 




DISTRIBUTION OF RANGE 


397 


6. The range, the midrange, and the extremes. The asymptotic* dis¬ 
tribution (27) of the reduced range was obtained by convolution of the asympto¬ 
tic distributions (5) of the extremes. The same method leads to the asymptotic 
distribution of the reduced midrange [8] 

(31) V = Ot(x i + X n ). 

TABLE IA 


The Reduced Range R as Function of Its Probability (R) 


* (R) 

0 

1 

2 

3 

4 

5 

6 

i 

8 

9 

.000 

— 

* 

-3.20 

-3.12 

-3.05 

-3.00 

-2.96 

-2.92 

-2.89 

-2.86 

.00 

— 

-2.83 

-2.64 

-2.52 

-2.43 

-2.36 

-2.30 

-2.25 

-2.20 

-2.16 

.0 

— 

-2.12 

-1.84 

-1.65 

-1.51 

-1.39 

-1.28 

-1.19 

-1.10 

-1.02 

.1 

-0.95 

-0.88 

-0.81 

-0.75 

-0.69 

-0.63 

-0.58 

-0.52 

-0.47 

-0.42 

.2 

—0.37 

-0.32 

-0.27 

-0.22 

-0.18 

-0.13 

-0.09 

-0.04 

0.00 

0.04 

.3 

0.09 

0.13 

0.17 

0.22 

0.2G 

0.30 

0.34 

0.38 

0.43 

0.47 

.4 

0.51 

0.55 

0.59 

0.63 

0.68 

0.72 

0.76 

0.80 

0.84 

0.89 

.5 

0.93 

0.97 

1.02 

1.06 

1.10 

1.15 

1.39 

1.24 

1.28 

1.33 

.6 

1.38 

1.43 

1.47 

1.52 

1.57 

1.62 

1.67 

1.73 

1.78 

1.84 

.7 

1.89 

1.95 

2.01 

2.07 

2.13 

2.19 

2.26 

2.33 

2.40 

2.47 

.8 

2.54 

2.62 

2.70 

2.79 

2.88 j 

2.97 

3.07 

3.18 

3.29 

3.41 

.9 

3.54 

3.69 

3.85 

4.03 

4.23 

4.46 

4.75 

5.11 

5.61 

6.45 

.99 

6.45 

6.57 

6.71 

6.87 

7.05 

7.26 

7.52 

7.85 

8.31 

9.10 

.999 

9.10 

9.22 

9.35 

9.50 

9.67 

9.88! 

10.12 

10.44 

* 

* 


* These values have not been calculated. 


On the other hand, the asymptotic distributions of the reduced extremes are 
obtained by introducing the transformations 

(32) 2/i = a(xi + u); y n = ot{x n - u) 

into formulas (5). It is interesting to compare these four distributions and four 
probabilities with each other. Thi$ is done in Figures 1 and 2. The probability 
and the distribution of the midrange are practically identical with the probability 
and distribution of the smallest value, for small values of the midrange, and 
become practically identical with the probability and distribution of the largest 
value for large values of the midrange. Fig. 2 shows that the asymptotic dis¬ 
tribution of the reduced range is less asymmetrical than the asymptotic distribu¬ 
tions of the reduced extremes. 




398 


E. J. GUMBEL 



Variate. 





DISTRIBUTION OF RANGE 


399 


Table II contains some characteristic values for these four asymptotic dis¬ 
tributions. The first three columns are obtained from previous publications 
[6, 8]. The mean range is equal to the range of the means for the extremes. 
The median of the range is larger than the range from the median of the largest 
to the median of the smallest value. The mode of the range is shghtly smaller 
than the mean of the largest value. These statements hold, of course, onlyjfor 
the reduced variates. 



Fig 2 

From the mode ft of the reduced range given in equation (30) and the trans¬ 
formation (10), the mode w of the range itself is obtained as 

w = 2u H— 
a 

whereas the difference of the modes of the largest and of the smallest values is 

x n — x i = 2 u. 

- - - 
IV = Xn — x 1 + ~ . 

a 


Consequently 

( 33 ) 





400 


E« J# GUMBEL 



Redoc to Ram^e 






DISTRIBUTION OP RANGE 


401 


For a symmetrical initial distribution of the exponential type the mode of the 
range converges toward the range of the modes of the smallest and of the largest 
value, provided that the parameter a increases without limit with the sample 
size. Thus this convergence does not hold for all symmetrical distributions. 

The last two lines in Table II give the four probabilities corresponding to the 
intervals from the mean m minus once (twice) the standard deviation <r, up to 
the mean plus once (twice) the standard deviation. The first probability for 

TABLE II 


Characteristics for the 4 Asymptotic Reduced Distributions 


1 

Characteristic 

2 

Largest Value 

3 

Smallest Value 

4 

Midrange 

5 

Range 

Mode 

0 

0 

0 

.506 

Expectation 

7 - .57722 

= -.57722 

0 

2y - 1.15444 

Median 

— Iglg2 = .36651 

- -.36651 

0 

.929 

Seminvariant char, 
function 

r(i - 0 

r(i + t) 

r(i-t)-r(i + t) 

r j (i - t) 

Variance 

£ = 1.64493 

= 1.64493 

ir a 

3 

- 3.28986 

First -F second mo¬ 

ft = 1.29857 

-1.29857 

0 

.64928 

ment quotient 

Xfc 

II 

Cnr 

V 

5.4 

4.2 

4.2 

95% Probability 

2.97 

1.10 

2.94 

4.46 

99% Probability 

4.60 

1.53 

4.60 

6.45 

+ 

i 

i 

.72 

.72 

.72 

.71 

+ 2<t) — F(n — 2a) 

.90 

.90 

.95 

.95 


the four distributions is about the same as for the normal distribution. The 
second. probability for the range and the midrange is about the same as for 
the normal one. 

7. The asymptotic distribution of the range for a symmetrical variate. 

The asymptotic distribution of the range R is, of course, independent of 
the sample size, and parameter-free. Both statements do not hold for the 
distribution g(w) of the range proper which is, from (11) 

(34) g(w) = c*p[a(w - 2 u)]. 

In this formula, the range is expressed in the same units as the initial variate. 
The parameters a and u are functions of the sample size n, the function depending 



402 


E. J. GUMBEL 


upon the initial distribution. From equations (6), (8), (14) follows that an 
increase of the sample size has two influences on the distribution of the range. 
The increase of the parameter u shifts the distribution toward the right without 
changing its form, whereas the parameter a influences the shape of the distribu¬ 
tion. If a increases (decreases) with n, the distribution of the range shrinks 
(spreads) with increasing sample size. If a is independent of n, an increase of 
the sample size does not change the shape of the distribution. Only in the first 
case may we increase the precision of the range by increasing the sample size. 
The two parameters thus influence the range in the same way as they influence 
the extreme values. 

To use equation (34) for a given initial distribution and a given sample size, 
we have to determine the expected largest value u and the parameter a as func¬ 
tions of n. We may use the definitions (6), (7), (8) if the initial distribution is 
known and of the exponential type, and if the sample size is so large that the 
most probable largest value is sufficiently near to the solution of (7). 

As a first example, consider the so-called logistic distribution. This prob¬ 
ability is 

(35) *(*) - (1 + e~T\ 

The initial distribution is 

(35') <p(x) = *(aO(l - Q(x)) 

and the derivative is 

(35") <p'(x) = 4K*)(1 - *(aO)(l - »(*)). 

Equation (6) becomes 

1+<T“= 

n — 1 

whence the expected largest value 

(36) u = lg(n - 1). 

The most probable largest value x n for n observations is obtained from (7). 
This equation becomes, from equation (35) 

, (»- 1)(1 -*(*■)) = -1 +2 HXn) 


whence 4>(x n ) = —T“7 

n + 1 

Equation (35) leads to the most probable largest value 
(360 x n = Ign. 

Even for n as small as 30 the difference between x n and u is less than 1%. Con¬ 
sequently the asymptotic form of the distribution of the range may be used even 
for small samples. The two parameters are 




DISTRIBUTION OF RANGE 


403 


Since a converges toward unity, an increase of the sample size shifts the distribu¬ 
tion of the range toward the right without influencing its shape: the precision of 
any estimate made from the range cannot be increased by increasing the sample 
size. 

The characteristic ranges introduced in paragraph 5 are obtained immediately: 
the mean TO, the mode w, the median range w and the ranges w.n and 10.99 

W = lgn + 1.154; w = lgn +.506; 

w = lg n + .929; w. 95 = lg n + 4.46; w, 99 = lg n + 6.45 

are parallel straight lines if traced as functions of the sample size n on semi- 
logarithmic paper. 

For the normal distribution we cannot expect such simple results. Here, u 
and a can only be calculated as numerical functions of n although limiting forms 
of these functions are known. The parameter a increases with w, and the 
standard error of the range decreases without limit although very slowly. The 
logistic distribution belongs to the first, the normal distribution to the second 
class of initial distributions of the exponential type. 

The probabilities and the distributions of the range for normal samples of 
size 5, 10, and 20 as calculated by E. S. Pearson and H. 0. Hartley [16] are 
traced in Figures 4 and 5. Our aim is to trace the corresponding asymptotic 
probabilities and distributions in order to see how far the asymptotic ranges 
differ from the exact ones. However, we have first to settle the preliminary 
question how far the most probable largest value x n differs from the expected 
largest value u. The most probable largest value x n is obtained from (7) which 
becomes, for the normal distribution, 

(38) x n $(x n ) = (n - 1 )(p(x n ). 

The results x n as functions of n are shown in Table III cols. 1 and 2 . The 
expected values u obtained from ( 6 ) are given in col. 3. For small samples, the 
two values x n and u differ widely, as might be expected. We are inclined to 
conclude that the asymptotic distribution of the range cannot hold for small 
samples. However, the only legitimate conclusion to be drawn is, that we can¬ 
not calculate the two parameters in the way stated before ( 6 ) and ( 8 ). Instead, 
we estimate them directly from the observations. The question of the most effi¬ 
cient estimates of these parameters is not yet solved. The simplest way is to 
use the mean range w n and the standard deviation of the range <r w , n as given by 
Tippett [20] and Pearson [15]. To distinguish these estimates from the asympto¬ 
tic values, we write the estimates with an index n. From (14) we obtain 

(39) — = — ov,„; 2un . 

a n 7T a n 

Table III gives the calculated means w n and standard deviations «r Wl% of the 
range, and the estimates l/a n and 2 u n . Fig. 6 shows how the most probable 



404 


E. J. GUMBEL 


largest values x n approach the expected largest value u with increasing sample 
size. The estimate u n quickly approaches u. Besides we trace the mean range 
, the standard error of the range a w , n , and l/a n which is proportional to it. 



From col. 8 follows that the condition au ^ 2 is fulfilled from n ^ 6 onward. 
The ranges obtained from the transformations 

R 

(40) w = 2 u n + ~• 

atn 

are given in Table IV, cols. 3-7. The asymptotic probabilities of the range as 
obtained from the combination of columns 3-7, and col. 2 of Table IV are traced 




DISTRIBUTION OF RANGE 


405 


in Fig. 4 as separated points. The asymptotic probabilities are situated very 
near to the exact ones. Therefore the same method was used to calculate the 
asymptotic probabilities of the range for n = 50 and n = 100 which have not 
been calculated by Pearson. They too are traced in Fig. 4. 



The asymptotic probabilities of the range hold even for small normal samples. 
However, the parameters obtained from the exact distribution differ considerably 
from their asymptotic values. In other words: The asymptotic probabilities of the 
range hold even for small normal samples provided that the parameters are taken 
from the observations . 

To compare the asymptotic distributions of the normal range to the calculated 
distributions, we attribute the asymptotic differences A¥/a» for a unit interval 
Aw = 1 to the middle of the corresponding intervals. The results are traced in 
Fig. 5 for n = 5, 10, 20, 50, 100. On the other hand, we take the differences 





406 


E« Jt GUMBEL 


for unit intervals from Pearson's tables, and trace them in the same graph. 
The fit of the calculated to the asymptotic values may be considered satisfactory. 

TABLE III 

Estimate of Parameters from the Calculated Distributions 
of the Normal Range 


1 

2 

3 

4 

Sample 
size n 

Largest Value 

Mean Range 

VS n 

Modal 

*n 

Expected 

u 

3 

.765 

.431 

1.693 

4 

.938 

.674 

2.059 

5 

1.061 

.842 

2.326 

10 

1.419 

1.282 

3.078 

20 

1.740 

1.645 

3.735 

50 

2.126 

2.054 

4.498 

100 

2.377 

2.326 

| 5.015 


5 

6 

7 

8 

Standard 

deviation 

Estimated parameters 
of the range 

Lower 

limit 

<r w.n 

l/«n 

2u n 

2a n u n 

.8884 

.4898 

1.128 

2.30 

.8798 

.4851 

1.499 

3.09 

.8641 

.4764 

1.776 

3.73 

.797 

.439 

2.571 

5.86 

.729 

.402 

3.271 

8.14 

.653 

.360 

4.082 

11.34 

.605 

.334 

4.630 

13.86 


TABLE IV 


Asymptotic Probabilities for Normal Ranges Taken from Small Samples 


l 

2 

3 

4 

5 

6 

7 

Reduced 

range 

R 

Probability 

Normal ranges \v 

= 2u n + R /a n for sample sizes 

G(w) = ¥(R) 

n - 5 

n - 10 

n - 20 

n =« 50 

n =« 100 

-3 

.000 

.35 

1.25 

2.07 

3.00 

3.62 

-2 

.014 

.82 

1.69 

2.47 

3.36 

3.96 

-1 

.093 

1.30 

2.13 

2.87 

3.72 

4.30 

0 

.280 

1.78 

2.57 

3.27 

4.08 

4.63 

1 

.517 

2.52 

3.01 

3.67 

4.44 

4.96 

2 

.719 

2.73 

3.45 

4.07 

4.80 

5.30 

3 

.853 

3.21 

3.89 

4.48 

5.16 

5.63 

4 

.929 

3.68 

4.33 

4.88 

5.52 

5.97 

5 

.967 

4.16 

4.77 

5.28 

5.88 

6.30 

6 

.985 

4.63 

5.20 

5.68 

6.24 

6.63 

7 

.994 

5.11 

5.64 

6.09 

6.60 

6.97 


Fig. 5 shows furthermore how the distributions of the range are shifted toward 
the right and become more concentrated for increasing sample sizes. 

As an example for the practical application of the asymptotic distribution of 
the range, we use an observed distribution of 50 ranges taken from samples of 








408 


E. J. GUMBEL 


n = 14 normal values given in Freeman's book [5] p. 128. The observed step 
function is traced in Fig. 7. For reasons given in a previous article [7] we 
attribute the cumulative frequency .5 to the smallest range 3, and the cumulative 
frequency 49.5 to the largest range 18. To compare this step function with the 



4 A >2. 20 

Fig. 7 


probability G(w), we estimate the two parameters u n and a n from formula (39). 
Thejmean range w n and the estimate s w , n of the standard deviation of the ranges 
are 

W = 10.68; 8 v , n = 2.93. 

Consequently we obtain, from (39) 

~ = 1.61; 2 u n = 8.82. 




DISTRIBUTION OF RANGE 


409 


The theoretical ranges are thus, from (40), 

w = 8.82 + 1.61 R. 

The corresponding probabilities G(w) taken from Table I are traced in Fig. 7. 
The fit of the theory to the observations is certainly satisfactory, especially if 
we take into account that the ranges are given in integer numbers only. 

8. The mth range and the asymmetrical case. An obvious generalization 
of the theory as established in paragraph 4 consists in the construction of the 
asymptotic distribution of the mth range for an unlimited symmetrical distribu¬ 
tion of the exponential type. The mth range is the positive distance from the 
mth observation from above, x m , to the mth observation from below, m x. We 
suppose m to be very small compared to the sample size. Under the conditions 
stated in the beginning, the joint distribution to„( m x, x m ) of the mth extreme 
values splits into the product of the asymptotic distribution of the ?nth extreme 
value from above, fm(xm), by the asymptotic distribution of the mth extreme 
value from below, m f( m x). Here, [6] 

fm(Xtn) = OLm CXp [-ma m (x m — V m ) ~ mt“ ttm(Xm_Utn) ] 

mfimx) = a m exp [rnamimX + Wm) ~ rnc am(mX+Vm) ] 

The sample size must be so large that the most probable mth extreme value x m 
is sufficiently near to u m which is defined as the solution of 

*(0 = 1 - - . 

n 


The factor a m defined by 


<p(u m ) 

” 1 - $(0 


is related to the asymptotic standard error u m of the mth extreme value by 


Otm O’m 



The joint asymptotic distribution x m ) of the mth smallest value and the 

mth range 

(41) Wm = Xm — m X 


is 


to( m x, wj = a m exp [- ma m (w m - 2z/ w ) - rne am{mS+Um) - . 

The asymptotic distribution g{w m ) of the mth range is, dropping the index m of 
the variable n x, 

g(w m ) = f + ”exp [- dx. 



410 


E. J. GUMBEL 


Again we introduce a reduced range R m defined by 

(42) Cim(w m — 2 Urn) = Rm ^ ~2 (X m U m 
and put for the integration 

Ct m (X + Um) = V . 

Then the asymptotic distribution \p(R m ) of the reduced mth range is 

(43) tiRm) = c~ mRm £ exp[—??u v — me~ v ~ Rm \ dy. 

The probability ^(R m ) for the mth range 

*(Rm) = f dz 

J~2a m u m 

cannot be reduced to a single integral. This is due to the fact that the proba¬ 
bilities of the mth extreme values cannot be written down except in the integral 
form [6]. No differential equation similar to (17) exists. However, the function 
(43) could be calculated by numerical methods. The mean R m , the generating 
function and the moments of the mth range have been given in a previous 
paper [8]. 

For sake of completeness, consider finally an unlimited asymmetrical initial 
distribution of the exponential type. In this case, the joint distribution of the 
smallest and of the largest value splits again, for large samples, into the product 
of the asymptotic distributions fi(xi) and f n (x n ) of the smallest and of the largest 
values which are now [6] 

Mxd = axexpM*. - Ml ) - e-*'—>]; 

fn(x„) = a, exp[—a„(z„ - «„) - 

Here, a n and u n are defined, as previously, by (6) and (8). The sample must 
be so large that the most probable smallest value Jt, is sufficiently near to the 
solution of 

4>(wi) = - . 
n 

The factor a i defined by 

<p(u 0 

ai *(«i) 

is related to the asymptotic standard error of the smallest value by 

The joint asymptotic distribution of the smallest value x x and the range w 

to(£i , w) = Ofittn exptafo — Ml) — Ctn{xi + W — Un) ~ C ai( * l ‘“ Wl) — 



DISTRIBUTION OF RANGE 


411 


contains four parameters instead of the two which exist in the symmetrical case. 
However, the number of parameters may be reduced to one. We introduce a 
reduced range R defined by 

(44) R = a n (w - u n + uO 

being the range itself minus the range of the modes divided by a factor pro¬ 
portional to the standard error of the largest value. If we put 

(45) auto ~ «i) = V, — = 0 

oil 

the distribution ^(i?) of the reduced range becomes, in the asymmetrical case, 

(46) yp{R) = e~ R f exp[i/(l —$) — e v — dy 

and the probability^^) for the reduced range is 

(47) ty(R) = J exp [y — e v — e~ Pv ~ R ] dy 

a formula which may immediately be verified by differentiation with respect to 
R. The mode R of the range is the solution of 

\//(R) = c~ H £ exp[?/(l -2 0) - R — e v — e~* v ~ R \ dy. 

Contrary to the symmetrical case, the latter integral cannot be expressed by the 
probability, and no simple differential equation similar to (17) exists. The ex¬ 
pressions (46) and (47) contain a single constant 0 measuring the asymmetry of 
the initial distribution. In the symmetrical case, 0 = 1, we obtain, of course, the 
previous formulas (12) and (13). In the asymmetrical case, the mean, the 
variance, and the higher moments of the rath range may be derived from the 
generating function given in a previous paper [8]. 

The asymptotic distribution of the rath range in the asymmetrical case can 
easily be obtained by combining the two procedures used in this paragraph. 

REFERENCES 

[1] L. von Bortkiewicz, “Variationsbreite und mittlerer Fehler,” Sitzungsberichte d. 

Berliner Math. Gesellschaft , Vol. 21, '(1921). 

[2] -, “Die Variationsbreite beim Gauss’schen Fehlergesetz,” Nordisk Statistisk 

Tidskrift, Vol. 1 (1922). 

[3] A. G. Carlton, “Estimating the parameters of a rectangular distribution,” Annals 

of Math. Stat.y Vol. 17 (1946). 

[4] R. A. Fisher and L. H. C. Tippett, “Limiting forms of the frequency distribution 

of the largest or smallest member of a sample”, Proc. Cambridge Phil. Soc. t 
Vol. 24 (1928). 

16] H. A. Freeman, Industrial Statistics, John Wiley and Sons, 1942. 

[6] E. J. Gumbel, “Les valeurs extremes des distributions statistiques”, Ann. Inst. H. 
Poincark , Vol. 4 (1935). 



412 


E. J. GUMBEL 


[7]-, “On serial numbers”, Annals of Math. Stat ., Vol. 14 (1943). 

[g] - f “Ranges and midranges”, Annals of Math. Stat., Vol. 15 (1944). 

[9]-, “On the independence of the extremes in a sample”, Annals of Math. Stat., 

Vol. 17 (1946). 

[10] H. 0. Hartley, “The range in random samples”, Biometrika, Vol. 32 (1942). 

[11] M. G. Kendall, The Advanced Theory of Statistics, Vol. 1, London, 1943. 

[12] A. T. McKay and E. S. Pearson, “A note on the distribution of range in sample sizes 

of n”, Biometrika, Vol. 25 (1933). 

[13] -, “Distribution of the difference between the extreme observations and the 

sample mean in samples of n from a normal universe”, Biometrika, Vol. 27 
(1935). 

[14] British Association for the Advancement of Science, Mathematical Tables Vol. 

VI: Bessel Functions , Part I: Functions of order zero and unity, Cambridge 
Univ. Press, 1937. 

[15] E. S. Pearson, “A further note on the distribution of range in samples taken from a 

normal population”, Biometrika, Vol. 18 (1926). 

[16] E. S. Pearson and H. O. Hartley, “The probability integral of the range in samples 

of n observations from a normal population,” Biometrika, Vol. 32 (1942). 

[17] Karl Pearson, Tables for Statisticians and Biometricians, Part II, Cambridge Univ. 

Press, 1931. 

[18] Student, “Errors in routine analysis”, Biometrika, Vol. 19 (1927). 

[19] Arnold N. Lowan (technical Director), Table of the Bessel Functions Kq(x) and K\(x) 

for x between zero and one, Math. Tables Proj., New York. 

[20] L. II. C. Tippett, “On the extreme individuals and the range of samples taken from a 

normal population”, Biometrika, Vol. 17 (1925). 

Addition at proof reading: 

G. Elfving’s article “The asymptotical distribution of range in samples from a normal 
population”, Biometrika, Vol. 35 (1947), appeared when this manuscript wras ready for 
print. Elfving considers a probability transformation of the range whereas we deal with 
the range itself. His distribution requires the knowledge of the initial distribution and 
of the sample size, whereas this knowledge is not required in our asymptotic formula. 



LOW MOMENTS FOR SMALL SAMPLES: A COMPARATIVE STUDY OF 

ORDER STATISTICS 

By Cecil Hastings, Jr., Frederick Mosteller, John W. Tukey, 
and Charles P. Winsor 

Douglas Aircraft Co ., Harvard University , Princeton University, 
and The Johns Hopkins University 

1. Summary. The means, variances, and covariances for samples of size 
< 10 from the normal distribution, a selected long-tailed distribution, and the 
uniform distribution are tabled and compared with the usual asymptotic ap¬ 
proximations. The methods of computation used and the accuracy expected 
are discussed. Use is made of the representation of an arbitrarily distributed 
variate as a monotone function of a uniformly (rectangularly) distributed vari¬ 
ate. It is hoped that these tables will encourage experimentation with new 
statistical procedures. 

2. Introduction. Two sorts of statistical procedures have been widely ex¬ 
ploited in theoretical statistics—first the use of linear and quadratic combina¬ 
tions of the unordered observations and, second, the use of ranked (ordered) 
observations. Statistics based on ordered observations have recently been 
dubbed systematic statistics [2, Mosteller, 1946]. Analytic processes and a few 
necessary numerical tables have advanced the study of the first procedure greatly, 
at least for the special case of the normal distribution; but analytic procedures 
have not done much to exhibit the behavior of systematic statistics and the neces¬ 
sary tables have been lacking. 

It would be very helpful to have (1) at least the first two moments (including 
product moments) of the order statistics, and (2) tables of the percentage points 
of their distributions, for samples of sizes from 1 to some moderately large value 
such as 100 and for a large representative family of distributions. This is a 
large order and will require much computation. 

The first step in this direction was taken by Fisher and Yates [1] by tabulating 
the means, to two decimal places, of all order statistics from normal samples of 
size < 50. The present paper continues the process by supplying ail means, 
variances, and covariances for samples of size < 10 from (a) the normal dis¬ 
tribution, ( b ) the uniform (rectangular) distribution, (c) a special distribution 
with long tails. For purposes of comparison, we also supply approximate 
means, variances, and covariances for the uniform and the special distribution 
computed from suitable asymptotic formulas. 

The special distribution has the representing function 

(1) r{u) = (1 - u)- lno - u-™, 

where u has the uniform distribution on the interval [0, 1], and x = r{u) is the 
variable whose order statistics interest us. This special distribution was es- 

413 



414 


HASTINGS, MOSTELLER, TUKEY, AND WINSOR 


pecially constructed 1) to have high tails and 2) to provide moments of order 
statistics in closed form which could be evaluated with a reasonable amount of 
labor. The normal distribution is rather unreasonable in this latter respect— 
there being no known expression except in terms of single and double quadra¬ 
tures of some considerable numerical difficulty. 

We have restricted ourselves to samples of size < 10, and to only three dis¬ 
tributions, all of these symmetrical, because of limited man-power rather than 
limited interest. Additional tables of a similar nature will surely prove helpful. 

In order to obtain even as much information as provided in this paper, it has 
been necessary to make a joint effort, dividing the labor. The various parts of 
the work have been carried out more or less separately by the various authors— 
the means and variances for the normal by Mosteller, the covariances for the 
normal (which, with their double quadratures, required far more time than all 
the other thought and computation combined) by Hastings with some assistance 
from Mosteller, the choice of the special distribution by Tukey, and the com¬ 
putation for it by Winsor. 


3. Results. In this section we provide the various tables that have been 
computed. 

Table I gives the mean and standard deviation of the ith order statistic 
x(i | n), [or £, |n , we use whichever notation seems less likely to confuse and 
agree that x(l | n) > x(2 | n) > • • • > x(n | n)] from a sample of size n drawn 
from a uniform ( U ), normal ( N ), and a special distribution ( S ). All three 
distributions have been adjusted to have zero mean and unit variance. In 
addition Table I gives approximations for the mean and standard deviation as 
computed from asymptotic formulas for the normal (AN) and the special (AS), 
If f(x) is the density function, the asymptotic approximation for the mean 
m(i | n ) of the eth order statistic from a sample of size n is obtained by solving 
the equation 



dx = i/(n + 1) 


for m(i | n). Similarly the formula used for the asymptotic variance of x(i | n) 
is 

_ i (n — i + 1) 

n(n + l)*{/[m(z|n)]) 2 ' 

Values are given for n = 1 , 2, • • • , 10 and i = 1 , • • • , jj^J. If m (i| n) is 

an entry in the table for means, a missing entry m(n — i + 1 | n) = — m(i | n); 
if w(i | n) is an entry in the table of standard deviations, a missing entry 

w(n — % + 11 n) = w{i | n). 

Table II gives the variances and covariances of the order statistics for the 
normal distribution (AT) and the same quantities as approximated by the asymp- 



LOW MOMENTS FOR SMALL SAMPLES 


415 


TABLE I 


Means and standard deviations of order statistics x(i\n) for uniform distribution 
( U ), normal (AT), special (aS), asymptotic normal (. AN ), 
asymptotic special (A<S) 


Mean 

Standard Deviation 

n 

% 

U N S 

AN AS 

U N S 

AN AS 

1 

1 

0 0 0 

0 0 

1.00000 1.00000 1.00000 
1.2533 .9804 

2 

1 

.57735 .56419 .53493 

.4307 .3418 

.81650 .82565 .84490 

.9168 .7486 

3 

1 

.86603 .84628 .80240 

.6745 .5466 

.67082 .74798 .82783 

.7867 .6823 

2 

0 0 0 

0 0 

.77460 .66983 .58457 

.7236 .5660 

4 

1 

1.03923 1.02938 .98473 

.8416 .6954 

.56569 .70122 .82982 

.7144 .6542 

2 

.34641 .29701 .25540 

.2533 .1992 

.69282 .60038 .52582 

.6340 .5035 

5 

1 

1.15470 1.16296 1.12449 
.9674 .8136 

.48795 .66898 .83642 

.6670 .6415 

2 

.57735 .49502 .42567 

.4307 .3418 

.61721 .55814 .50390 

.5798 .4730 

3 

0 0 0 

0 0 

.65465 .53557 .44903 

.5605 .4384 

6 

; 

. 

1 

1.23718 1.26721 1.23847 

1.0676 .9114 

.42857 .64492 .84423 

.6331 .6330 

2 

.74231 .64176 .55458 

.5659 .4539 

.55328 .52874 .49425 

.5426 .4567 

3 

.24744 .20155 .16785 

.1800 .1412 

.60609 .49620 .41648 

.5147 .4057 












416 


HASTINGS, MOSTELLER, TUKEY, AND WINSOR 


TABLE I (Continued) 


Mean 

Standard Deviation 

n 

% 

U N S 

AN AS 

U N S 

AN AS 

7 

1 

1.29904 1.35218 1.33506 
1.1504 .9957 

.38188 .62603 .85217 

.6072 .6141 

2 

.86G03 .75737 .65892 

.6745 .5462 

.50000 .50670 .48992 

.5150 .4359 

3 

.43301 .35271 .29375 

.3186 .2512 

.55902 .46875 .39963 

.4826 .3772 

4 

0 0 0 

0 0 

.57735 .45874 .37747 

.4737 .3617 

8 

1 

1.34715 1.42360 1.41892 
1.2207 1.0697 

.31427 .61066 .85988 

.5867 .6276 

2 

.96225 .85222 .74690 

.7647 .6259 

.45512 .48930 .48823 

.4936 .4402 

3 

.57735 .47282 .39498 

.4307 .3418 

.51640 .44807 .38998 

.4584 .3743 

i 

.19245 .15251 .12502 

.1397 .1091 

.54433 .43264 .35616 

.4447 .3494 

9 

1 

1.38564 1.48501 1.49358 

1.2816 1.1358 

.31334 .59780 .86725 

.5691 .6268 

2 

1.03923 .93230 .82317 

.8416 ' .6954 

.41779 .47508 .48800 

.4763 .4361 

3 

.69282 .57197 .47995 

.5244 .4191 

.47863 .43171 .38414 

.4393 .3722 

4 

.34641 .27453 .22504 

.2533 .1992 

.51168 .41303 .34321 

.4227 .3356 

5 

0 0 0 

0 0 

.52223 .40751 .33173 

.4178 .3268 











LOW MOMENTS FOR SMALL SAMPLES 


417 


TABLE I (Concluded) 


Mean 

Standard Deviation 

n 

i 

U N 4 

AN AS * 

U N S 

AN AS 

10 

1 

1.41713 1.53875 1.56057 
1.3352 1.1956 

.28748 .58681 .87423 

.5557 .6275 


2 

1.10222 1.00135 .89062 

.9085 .7574 

.38569 .46318 .48859 

.4619 .4334 


3 

.78730 .65608 .55336 

.6046 .4866 

.44536 .41826 .38054 

.4238 .3604 


4 

.47238 .37572 .30866 

.3488 .2754 

.48105 .39756 .33477 

.4052 .3261 


5 

.15746 .12274 .09961 

.1142 .0894 

.49793 .38857 .31190 

.3973 .3117 


totic formulas (AN). The asymptotic covariance between x(i | n) and x(j | n) 
is given by 

j(n - i + 1) . . 

n(n + 1 ?f[m(i | n)\f[m(j | n )]’ 3 - u 

Symmetry relations exist for supplying the missing entries, 

cov [x(i | n ), x(j | n)] = cov [x(n — i + 1 | n), x(n — j + 1 | n)]. 

It might seem more natural to use the factor n + 2 rather than n in the denomi¬ 
nator of the asymptotic variances and covariances so that the formulas would 
more nearly agree with those for the uniform distribution. However the use of 
n gives much better approximations for the normal and the special distribution. 

Table III gives the variances and covariances of the order statistics for the 
uniform distribution (U), and Table IV gives the corresponding results for the 
special distribution (S). Table V gives the asymptotic variances and co- 
variances for the special distribution (AS). 

Table VI compares the correlation coefficients between the order statistics 
x(i | n) and x(j | n) for the uniform ( U ), the normal ( N ), and the special dis¬ 
tribution (S). 

It seems worthwhile to call attention to the following: 

(1) . Even for n = 10, the asymptotic formulas do not give satisfactory mean 
values for the order statistics. 

(2) . For n > 8, the asymptotic standard deviations for the normal are close 




418 


HASTINGS, MOSTELLER, TUKEY, AND WINSOR 

enough to be very useful. For the special distribution we must except the two 
order statistics on each end from this statement. 

TABLE II 


Variances and covariances of (he order statistics jr(i|n) for the 
normal ( N ) and the asymptotic normal (AN) 



< 

• 

2 

3 

4 

L 5 

6 

7 

8 

9 

10 

ft 























% 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

2 

1 

.68 

.84 

.32 

.42 

















3 

1 

.56 

.62 

.28 

.33 

.17 

.21 
















2 



.45 

.52 

















4 

1 

.49 

.51 

.24 

.28 

.16 

.18 

.11 

.13 














2 



.36 

.40 

.24 

.27 















5 

1 

.45 

.44 

.22 

.24 

.15 

.17 

.11 

.12 

.07 

.09 












2 



.31 

.34 

.21 

.23 

.15 

.17 














3 





.29 

.31 















6 

1 

.42 

IS 

.21 

.22 

.13 

.15 

.11 

.12 

.07 

m 

IS 

KS 










2 



.28 

.29 

.19 

.20 

.14 

.15 


.12 












3 





.25 

.26 

.18 

.20 













7 

1 

.39 

.37 

.20 

.20 

.13 

.14 

.10 

.11 

.08 

.09 

.06 

.07 

.05 

.05 








2 



.26 

.27 

.17 

.19 

.13 

.14 

.10 

.11 

.08 

.09 










3 





.22 

.23 

.17 

.18 

.13 

.14 












4 







.21 

.22 













8 

1 

.37 

.34 

.19 

.19 

.13 

.13 

.09 

.10 

.08 

.08 

.06 

.07 

.04 

.05 

.04 

.04 






2 



.24 

.24 

.17 

.17 

.12 

.13 

.10 

.10 

.08 

.09 

.07 

.07 








3 





.20 

.21 

.15 

.16 

.12 

.13 

.09 

.11 










4 







.19 

.20 

.15 

.16 











9 

1 

.36 

.32 

.18 

.18 

.12 

.13 

.09 

.10 

.07 

.08 

.06 

.07 

.05 

.05 

.04 

.05 

.04 

.04 




2 



.23 

.23 

.16 

.16 

.11 

.12 

.09 

.10 

.08 

.08 

.06 

.07 

.05 

.06 






3 





.19 

.19 

.14 

.15 

.11 

.12 

.10 

.10 

.08 

.08 








4 







.17 

.18 

.14 

.14 

.12 

.12 









■ 

H 

■ 

■ 

a 

■ 

■ 

■ 

9 

9 

1 

9 


9 

g 

■ 

■ 

■ 

■ 

■ 



10 

1 

i 

i 

i 

E 

E 

E 

9 

E 

E 

E 


m 

E 

1 

0 

i 

E8 

i 

.03 

.03 


2 



.21 

.21 

.14 

.15 

.n 

.12 

.09 

.09 

.07 

.08 

.06 

.07 

.05 

.06 

.04 

.05 




3 





.17 

.18 

.13 

.14 

.11 

.11 

.09 

.09 

.08 

.08 

.06 

.07 






4 







.16 

.16 

.12 

.13 

.11 

.11 

.09 

.09 








5 






* 



.15 

.16 

.13 

.13 










(3). For n > 8, the asymptotic variances and covariances of the normal are 
close enough for many, if hot most purposes. 





LOW MOMENTS FOR SMALL SAMPLES 


419 


(4). For the special distribution, only the variances and covariances of mod¬ 
erately central order statistics are adequately given by the asymptotic formulas. 

TABLE III 


Variances and covariances for the uniform distribution ( U ) 


n 

\i 

•\ 

l 

2 

3 

4 

s 

6 

7 

8 

9 

10 

2 

1 

.66667 

.33333 









3 

1 

.45000 

.30000 

.15000 









2 


.60000 









4 

1 

.32000 

.24000 

.16000 

.osooo 








2 


.48000 

.32000 








5 

1 

.23810 

.19047 

.14286 

.09522 

.04762 







2 


.38095 

.28571 

.19047 








3 



.42857 








6 

1 

.18367 

.15306 

.12245 

.09184 

.06122 

.03061 






2 


I .30612 

.24490 

.18367 

.12245 







3 



.36735 

.27551 







7 

1 

.14583 

.12500 

.10417 

.08333 

.06250 

.04167 

.02083 





2 


.25000 

.20833 

.16667 

.12500 

.08333 






3 



.31250 

.25000 

.18750 







4 




.33333 







8 

1 

.11852 

.10370 

.08889 

.07407 

.05925 

.04144 

.02963 

.01481 




2 | 


.20741 

.17778 

.14815 

.11852 

.08889 

.05925 





3 



.26667 

.22222 

.17778 

.13333 






4 




.29630 

.23704 






9 

1 

.qpsis 

.08727 

.07636 

.06545 

.05455 

.04363 

.03273 

.02182 

.01091 



2 


.17455 

.15273 

.13091 

.10909 

.08727 

.06545 

.04363 




3 



.22909 

.19636 

.16354 

.13091 

.09818 





4 




.26182 

.21818 

.17455 






5 





.27273 






10 

1 

.08264 

.07438 

.06611 

.05785 

.04959 

.04132 

.03306 

.02479 

.01653 

.00826 


2 


.14876 

.13223 

.11570 

.09917 

.08254 

.06611 

.04959 

.03306 



3 



.19835 

.17355 

.14876 

.12397 

.09917 

.07438 




4 




.23140 

.19835 

.16529 

.13223 





5 





.24793 

.20661 






(5). The correlation coefficients change rather little from distribution to dis¬ 
tribution, the poorest approximation being for end order statistics. 



420 


HASTINGS, MOSTELLER, TUKEY, AND WINSOR 


TABLE IV 


Variances and covariances for the special distribution (S) 


n 

\ 

\ 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

1 

.71386 

.28615 









3 

1 

.68530 

.24214 

.15957 









2 


.34172 









4 

1 

.68860 

.23277 

.14141 

.11123 








2 


.27649 

.17532 








5 

1 

.69960 

.23154 

.13655 

.10004 

.08614 







2 


.25391 

.15418 

.11490 








3 



.20163 








6 

1 

.71272 

.23310 

.13544 

.09667 

.07786 

.07080 






2 


.24429 

.14506 

.10486 

.08514 







3 



.17345 

.12762 







7 

1 

.72619 

.23577 

.13582 

.09565 

.07517 

.06109 

.06012 





2 


.24002 

.14065 

.10004 

.07913 

.06776 






3 



| .15970 

.11509 

.09184 







4 




.14249 







8 

1 

.73940 

.23890 

.13687 

.09562 

.07420 

.06179 

.05471 

.05291 




2 


.23837 

.13850 

.09754 

.07608 

.06359 

.05615 





3 



.15208 

.10822 

.08499 

.07138 






4 




.12685 

.10053 






9 

1 

.75211 

.24219 

.13825 

.09608 

.07398 

.06085 

.05266 

.04789 

.04721 



2 


.23814 

.13756 

.09625 

.07443 

.06141 

.05327 

.04852 




3 



.14756 

.10413 

.08097 

.06707 

.05835 





4 




.11780 

.09225 

.07680 






5 





.11004 






10 

1 

.76428 

.24550 

.13978 

.09680 

.07414 

.06053 

.05176 

.04601 

.04271 

.04272 


2 


.23872 

.13732 

.09565 

.07354 

.06018 

.05156 

.04594 

.04266 



3 



.14481 

.10158 

.07846 

.06444 

.05533 

.04940 




4 




.11207 

.08707 

.07180 

.06186 





5 





.10016 

.08300 






4. Methods of calculation and accuracy for the normal distribution. The 
means and variances of the order statistics for the normal distribution were ob¬ 
tained from direct quadrature of forms like 

I* *V(*)]'[1 - F(*)] n -'-7(*) dx, k = 1, 2, 

J — oo 

F(x) = £ e-"’ dt and f(x) - F'(x). 


where 



LOW MOMENTS FOR SMALL SAMPLES 


421 


It is believed that the means are correct to within one unit in the fifth decimal 
and that the standard deviations are correct to within 2 or 3 units in the fifth 
decimal. 


TABLE V 


Variances and covariances of the special distribution as computed 
from asymptotic formulas 


n 

j 

X \ 

l 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

1 

.56044 

.28022 









3 

1 

.46550 

.22297 

.15517 









2 


.32038 









4 

i 

.42792 

.20168 

.13444 

.10698 








2 

■ 

.25347 

.16898 








5 

1 

.41156 

.19167 

.12579 

.09605 

.08231 







2 

■ 

.22368 

.14679 

.11208 








3 



.19221 








6 

1 

.40072 

.18667 

.12105 

.09080 

.07464 

.06679 






2 


.20861 

.13529 

.10147 

.08341 







3 



.16457 

.12343 







7 

1 

.37715 

.17527 

.11304 

.08394 

.06782 

.05842 

.05388 





2 


.19004 

.12258 

.09103 

.07354 

.06335 






3 



.14232 

.10569 

.08538 







4 




. 13731 







8 

1 

.39389 

.18276 

.11746 

.08669 

.06935 

.05873 

.05221 

.04924 




2 


.19382 

.12458 

.09194 

.07355 

.06229 

.05538 





3 



.14011 

.10341 

.08272 

.07005 






4 




.12211 

.09769 






9 

1 

.39286 

.18226 

.11881 

.08591 

.06829 

.05727 

.05092 

.04556 

.04367 



2 


.19019 

.12398 

.08965 

.07126 

.05977 

.05313 

.04754 




3 



.13855 

.10019 

.07963 

.06678 

.05938 





4 




.11265 

.08958 

.07512 






5 





.10680 






10 

1 

.39373 

.18242 

.11677 

.08560 

.06775 

.05646 

.04891 

.04379 

.04054 

.03937 


2 


.18784 

.12024 

.08813 

.06977 

.05814 

.05036 

.04508 

.04174 



3 



.12988 

.09520 

.07536 

.06280 

.05440 

.04871 




4 




.10633 

.08417 

.07014 

.06076 





5 





.09716 

.08098 






The evaluation of the covariances was much more troublesome, requiring the 
evaluation of iterated integrals of the form 


r xf(x)F i (x) [* - F(t)Ydtdx. 

J — oo J — oo 




422 


HASTINGS, MOSTELLER, TUKEY, AND WTNSOR 


Necessary linear combinations of such forms give rise to considerable loss of 
accuracy. The covariances are believed to be correct to within 1 unit in the 
second decimal (except for one or two values which may be off by two units). 

TABLE VI 

Correlation coefficients X 10 2 between order statistics x(i | n), x(j | n) for the 



Better tables of these covariances are badly needed, and it is hoped that someone 
will provide them. 

The asymptotic values are correct to the two decimals given. 














LOW MOMENTS FOR SMALL SAMPLES 


423 


5. Computation in terms of the representing function. It will prove con¬ 
venient in working with the special distribution, as indeed it does in many 
statistical procedures, to introduce the representing function r(u), which is a 
monotone function such that 

Pr {r(ui) < x < r(t^)} = u* — u x , u* > Ui • 

Thus if u has a uniform (= rectangular on [0, 1]) distribution then x = r(u) 
defines a variate with the given distribution. 

The ith order statistic of n from the uniform distribution, w*|» , is distributed 
according to 

1 - U)*~ l du, 0 < U < 1, 


where it is important to remember that ui tn is the largest and not the smallest 
order statistic; and the joint distribution of u = w< |n and v = Uj tn , 0 > i) } 
is given by 

i(j — n .1 v n ~*(u — — w)*"" 1 du dv, 0 < v < u < 1, 

Lh3 ' l ) n Jj 

where . . ? . is a multinomial coefficient. 

The means, variances, and covariances which we desire can be written as 
follows (it is immaterial whether we think of expectations over z’s or over u’ s): 

EM = E(r( Ui m)) = i jf r( M ) M —'(l - «) M dv, 

var (aJ»,n) — E(xi in ) (E(xi in )) = E(r (w», n )) — E(xi {n ) 

= i (^j - ur du - (E(M)\ 


COV (Xi|fi , X j |ft) — E{Xi, n Xj,n) E{x ,| ri) * EijCj |n) 

= E(r{ui,„)r(uj, n )) - E(x iin )E{x tin ) 

= i(j - i ) [ ” B ,1 f 1 f * r(u)r(v)v n ~’ (u - ^"‘(l - «) <_1 du 

J n j J •'O •'t) 


Introducing E,,t by 


&.« = it r(u)r(v)u*v % du dv , 


we have 

- <U - _ J 



424 


HASTINGS, MOSTELLER, TUKEY, AND WINSOR 


and, in particular, 

E(x 1,2#J|2) = 2^0,0 > 

E(x iib^4ib) = 60^2,i — 120i^i,8 + 602£o,a. 
Introducing by 

Ea, 9 = [ r l (u)u * dw, 


we have 

#( 4 ») 

and, in particular, 
Introducing E, by 


E(X2\b) = 20Es t z — 20 # 4,4 • 



we have 

?<-«*(' 7 ‘)s-« 

and, in particular, 

^(^815) 1=5 30^2 — 60Z?3 4" 30J^4 • 


Thus the computation of the desired means, variances, and covariances is 
reduced to the computation of the integrals E ., E 8>8 , and E a<t . 

We shall also want to calculate the asymptotic approximations to the means, 
variances, and covariances of the order statistics. For the uniform distribution, 
it is well known that 


mean 


n — i + 1 
n + 1 


var 


i(n — i + 1) 

(n + l)\n + 2)’ 


COV ( Ui\nUj\n) 


i(n — j + 1) 

(» + l) 2 (n + 2 ) 9 


(i < j)- 


These asymptotic formulas are transformed from u to x by the relations x = 
r(u) and dx = r'(u) du y giving 


r \ ( n — i -\r l\ 
approx mean (^|„) = r 1 ——- ■ 1, 

approx var (x<|n) = (r' 


i(n — i + 1) 

(n + l) s (n + 2) ’ 



LOW MOMENTS FOR SMALL SAMPLES 


425 


approx cov fe| n :r/|n) 



( ft — i + l\ i(n — j + 1 

ft +1 7 (ft + l) 2 (ft + 2 ) ’ 


a < j)> 


as noted above, in our calculations we have replaced n + 2 by n in the denomi¬ 
nator. 


6. Reduction of integrals for the special case. When the representing func¬ 
tion is 

* = r(M) = ft > 0 ), 

we obtain a symmetrical distribution with long tails. (For the normal dis¬ 
tribution r(u) = o(ln u) as u —► 0). The integrals we want are 

E, = f 1 {(1 - u)~ x - vT x \u' du, 

Jo 

E.,. = j[' 1(1 ~ w)" x - w -x )V du, 

E,, t = jf 1 £ {(l - «)- x - M - x } {(1 - „)“ x - iT x }m*v‘ du dv, 


which can be expressed as 

E . = - BM , 

E = A,,M ~ 2£, t .(\) + C,,M, 

E., t = A.M - £,.<(X) - C.,M + D:M, 


where 


-4,(X) = [ (1 — u) x u* du = &(—X, s), 

Jo 

f i 1 

•B.OO = / irV du = ——j--, 

JO 8+1 — X 

4,,.(X) = j(1 ~ w) _2X u* du = 6(—2X, s), 

B, .(X) = (1 - m) -x m _x u‘ du = 6(—X, s - X), 

Jo 

C. .,(X) = J o u-»u‘ du = - - 1 

= J [ J v (i ~ ^r X d - dM dt» 



426 


HASTINGS, MOSTELLER, TUKEY, AND WINSOR 


_ V / s \ + 1 — 2X, t ) 

£l\i) ( } t + 1 - X ' 

J? M (X) = f f (1 - u)-*v*u'v t du dv 
Jo J v * 

_ X- M , w b(i + 1 - X, t - X) _ b(s + t + 1 -X, -X) 
Zo ' i + 1 - X t + 1 - X 

C. JX) = jf l J[V(1 - «0' x wV dw dv 

= s 1 _ • ^ {&(-*> 0 ~ b(s + t + 1, —X, —X)), 

D, , ( (X) ~ J J m~ x v -x w' v‘ dw dv 

_1_ 

(t -f- 1 — \)(s + t + 2 — 2X)’ 

where throughout 


b(p, q) = 


?!g! 

(P + 3+1)! 


r(p + i)r(g + i) 

r(p + ? + 2) 


= B(p + 1,5 + 1). 


7. Calculations for the special distribution. The computations for the special 
distribution were made from the formulas in the preceding section.. The quan¬ 
tities Ay B, C, D were computed from r = s = 0 to r + s — 8, whence the values 
of E a , E„ , E at were calculated. The values of the means, variances, and co- 
variances were then obtained from the formulas of section 3. 

The means, variances, and covariances are believed to be accurate to the five 
decimal places given. 


8, Formulas and accuracy for the uniform. The means, variances, and co- 
variances of the uniform are given near the end of section 5. Since r(u) = u, 
they are also the values given by the asymptotic approximation, when n + 2 
is used. 

The tabulated values were computed to six decimal places and rounded to the 
four or five decimals given. 


REFERENCES 

[1] R. A. Fisher and F. Yates, Statistical Tables , Oliver and Boyd, London, 1943. 

[2] Frederick Mosteller, “On some useful ‘inefficient* statistics,” Annals of Math. Stat ., 

Vol. 17 (1946), pp. 377-408. 



SEQUENTIAL CONFIDENCE INTERVALS FOR THE MEAN OF A NORMAL 
DISTRIBUTION WITH KNOWN VARIANCE 

By Charles Stein and Abraham Wald 
Columbia University 

1. Summary. We consider sequential procedures for obtaining confidence 
intervals of prescribed length and confidence coefficient for the mean of a normal 
distribution with known variance. A procedure achieving these aims is called 
optimum if it minimizes the least upper bound (with respect to the mean) of the 
expected number of observations. The result proved is that the usual non¬ 
sequential procedure is optimum. 

2. Introduction. The problem of sequential confidence sets in general has 
been considered briefly by one of the authors [1]. Let {X<}, (i = 1, 2, • • •)> 
be a- sequence of random variables whose distribution is specified except for the 
value of a parameter 0 whose range is a space ft. Sequential confidence sets are 
determined by a rule as to when to stop sampling, together with a function of 
the sample whose value is one of a specified class of subsets of S2. The class of 
subsets is chosen in advance depending on the purpose of the estimation. For 
example, it may be the class of all intervals of prescribed length or the class of 
all sets whose diameter does not exceed a given value. It is required that the 
probability that this (random) set covers 0 should be greater than or equal to a 
specified confidence coefficient a for all 0. A procedure for finding sequential 
confidence intervals is considered optimum if it minimizes some specified function 
of the expected numbers of observations. Here this function is taken to be the 
least upper bound. In contrast with the result of this paper, a case where se¬ 
quential confidence intervals may have an advantage over non-sequential pro¬ 
cedures has been given by one of the authors [2]. The X t are independently 
normally distributed with unknown mean and unknown variance, and the prob¬ 
lem is to find confidence intervals of fixed length for the unknown mean. As 
was first showrn by Dantzig [3J this cannot be accomplished by a non-sequential 
procedure. Another case where this is true is the problem of finding confidence 
intervals of the form (po, kpo) where A; is a specified number greater than 1, for 
the probability in a binomial distribution. 

Let {X*}, (i = 1, 2, •••)> be independently normally distributed with un¬ 
known mean £ and known variance <r \. It is desired to specify a sequential 
procedure for obtaining confidence intervals of fixed length l for the mean £. 
This is provided by a rule according to which at each stage of the experiment, 
after obtaining the first m observations Xi , • • • , X m for each integral value to, 
one makes one of the following decisions: 

a) Take an (to + l)st observation. 

b) Terminate the procedure and state that the mean lies in the interval 

427 



428 


CHARLES STEIN AND ABRAHAM WALD 


(7 — §Z, Y -f- JO, where Y = (£ m (Xi, • • • , X m ), (£ m being a measurable real¬ 
valued function. The serial number m of the observation on which the proce¬ 
dure terminates is, of course, a random variable and will be denoted by n. 

For any relation R the symbol P{R | £) will denote the probability that R 
holds when £ is the true mean of Xi . The confidence coefficient of a sequential 
procedure S is defined by 


(1) a(S) = g.Lb. P(7 - < £ < F + JI | £). 

* 

Denote by n 0 (S) the maximum expected number of observations, i.e. 

(2) n 0 (S) = l.u.b. E(n | £, S) 

where E(n | £, S) denotes the expected value of n when £ is the true mean and the 
procedure & is used. 

A procedure S will be considered optimum if, for all S' such that a(S') — 
*{S), 

(3) no(fl) < no(S'). 

It will be shown that an optimum procedure S(v, c) can be obtained as follows: 

a) For all m < v, a fixed positive integer, take another observation. 

b) For m = v, terminate the procedure if 


(4) 


E X? -1 (e XiJ > ca; 


and let Y= . (The inequality (4) is used merely as a device for fixing 

v 1 

the probability of taking v observations, this random event to be independent 
of whether (Y — JZ, Y + JO covers £, given v .) 

c) Otherwise take a (v + l)st observation, terminating the process, and let 


Y - 


v + 1 


y-fl 

E*,., 


When c = 0, this is the usual non-sequential procedure. 

Clearly, 

(5) «[S(v, c)] = > C\H + [1 - P\xU > c}]H ( ~ ^ ~ ) ’ 

where 


< 6) i r 

Also 

(7) w 0 [<8(^, c)] = v + 1 — Plxl-i > c}, 

By a proper choice of v and c we can achieve any desired confidence coefficient 



SEQUENTIAL CONFIDENCE INTERVALS 


429 


a > H * There is no essential loss of generality in considering only the 

case at = 1, and this will be done in the remainder of this paper. 

3. A lower bound for n 0 (S) and an upper bound for a(S). Consider any 
sequential procedure S for obtaining confidence intervals of length L Put 

(8) «(€,» =P{F-1Z<£ < F + *Z|{). 

That is, <*(£, S) is the probability that the confidence interval will cover the true 
mean £ when the procedure S is used. According to (1), 

(9) a(S) = g.l.b. a(£, S). 

f 

In order to obtain a lower bound for no(S) and an upper bound for a(S), we 
suppose that the procedure S is applied when £ is not a fixed number but a ran¬ 
dom variable normally distributed with mean 0 and variance a 2 . Then the 
probability that the confidence interval covers £ is 

(10) a(<r, S) = £" <f £,,2 ‘‘ aft, S) <ft ^ a(S) 

and the expected number of observations is 

(11) E(n | <r, S) = £“ E(n | {, S) g n«(S). 

Let p m (£, &), (m = 1, 2, • * • , ad. inf.), denote the probability that n = m 
when £ is the true mean and procedure S is used. Put 

(12) p m (<T, S ) = £“ s ) 

Since 

oe 

(13) £(n | <r, S) = 2 wp m (<r, S) 

mail 

we obtain from (11) 

(14) S) g n 0 («S). 

(tal 

We shall now derive an upper bound for a(<r, &). Since = £ + c< where the 
c t - are independently normally distributed with mean 0 and variance 1, the joint 
distribution of £ and X if (i = 1, • • • , m), is a multivariate normal distribution 
with 


( 15 ) 


Ei = EXi = 0 



430 


CHARLES STEIN AND ABRAHAM WALD 


and covariance matrix 


£ ^ l 1 2 

(t e l e 


(16) E ■ : 


2 2 , t 

<7 (7 fl 


2 3 

<7 <7 


<r 4 + lj 


Thus the conditional distribution of £ given Xi , • • • , X m is normal with mean 


E(i\X l9 — 9 X m )~ (<r 2 , — ,* 2 ) 


( 3 , i 2 

(7+1 <7 

2 3 . i 

<7 <7+1 


(m — 1)(7 2 + 1 __ <7 2 

mcr 2 + 1 mo- 2 + 1 

_ c 2 (m — l)o- 2 + 1 

07) = <r*(l, 1, - - , 1) me* + \ 


<r + 1 I 


me 2 + 1 


me t 2 + 1 


me 2 + 1 m<7 2 + 1 


(m — l)<r 2 + 1 
me 2 + 1 




and variance 


4 / m \ 2 2 

+ T). E (S X ') - 55 TT 


' ' fmo> + 1)* VS 7 »>«> + 1 

If Xi, • • • , X m is a sequence for which the process is terminated on the mth 
trial, the conditional probability that the interval of length l will cover £ is 
clearly maximized by taking 

(19) Y = E{£\X 1 ,---,X m ) = -fj-Ex, 

me 1 + 1 i 

and, by (18) this probability has the value H (c m ) where H is defined by (6) and 





SEQUENTIAL CONFIDENCE INTERVALS 


431 


Hence, 

(21) a (< r , S) < £ p m ( a , S)II(c m ). 

Wl«-1 

From this and (10) we obtain 

(22) a(S) < £ Pm(<r, S)H(c m ). 

1 

This upper limit of a(S) and the lower limit of n 0 ($) given in (14) will be used 
later to prove that S(v,c) is an optimum procedure. 

00 

4. Maximum value of 22 Pm(<r, S)H(c m ) subject to the condition that 

i 

00 

22 wip m (<r, S) does not exceed a given bound. We shall show that the maximum 

i 

00 

of 22 pm (< r , S)H(c m ) subject to 

i 

00 

E(n | a, S) = 23 rnp m (.<7, S) g v + a, 

1 

where v is a positive integer and 0 < a < 1, is obtained by choosing p m (<r, S) = 
p* defined by 

Pm = 0 for m < v or m > v + 1 

(23) v* = 1 - a 

* 

p *+1 = a. 

For, suppose to the contrary that there exists a sequence {p m } such that the 
following conditions hold: 

00 

Pm > 0, 23 Pm ~ 1 

1 

00 00 

(24) 23 m Pm < v + a = 23 mPm 

t,p m H(c m ) > i.plHicm). 

1 1 
l 

We have 

(25) ff <«) = s/\ f ^ ^ C *" ^ 

Put 

1 f c|+1 1 x 

C = H(c^) - H(c,) = -j= y~ l e~ iv dy. 

V2 r J ‘i 


( 26 ) 



432 


CHARLES STEIN AND ABRAHAM WALD 


With the aid of p y = 1 — 2^*- P m > we obtain from the last two inequalities 
in (24) 

(27) 0 < E (p m - P * m )HM -cf,(p m - V t)m = E (p-» - P«)^m 

1 1 mr** 

where 


(28) = H(c m ) - tf(c,) - (m - y)[tf(c, + i) - tf(c,)]. 

Clearly K,+ 1 = 0. Also, for m < v, since the integrand is a strictly decreasing 
function of y , 


(29) 


y ' c ' tv dy — / y~ h c~ h 


dp 


< (y ~ m)t y * e - iv -(v — m) \ y 1 1 ** 


lv-c| 


= 0. 


Similarly for m > v + 1, K m < 0. But = 0 for m v, v + 1 so that 
(30) E (Pm - pt)K m < 0 

m)*v A v+l 


which contradicts (27) since K p +i = 0. 

Thus, we have shown that the inequality 

(31) E(n | <r, S) < v + a 
implies the inequality 

(32) E Pm (a-, S)H (c„) < (1 - a)II (c„) + aH(c r+1 ). 

I 

5. Proof that S(v, c) is an optimum procedure. Since, according to (14) 
and (22) 

(33) n 0 (5) > E(n \ a, S) and a(S) < E Pm(< r, S)H(c m ), 

1 

$ 

it follows from the result expressed in (31) and (32) that, for any procedure S 
satisfying the inequality 

(34) Tio(S) < v + o, 


we must have 


(35) a(S) < (1 — a)H{c,) + aH(c,+ 1 ) 
identically in <r. Since H(u) is continuous, it follows that 

(36) a(S) < (1 - a)H^ + aH(yV +1 
for any procedure S satisfying (34). 



SEQUENTIAL CONFIDENCE INTERVALS 


433 


The right hand side of (36) is a[S(v , c)] where c is chosen so that 

(37) 1 - a = P{ x l-i > c\. 

We use an indirect proof to show that S(v, c) is an optimum procedure. Sup¬ 
pose to the contrary that there is a procedure S' such that 

(38) «(S') = a[S(w, c)] 
but 


(39) no(S0 < th[S(p 9 c)]. 

By (5) and (7), a[£(r, c)] is a continuous strictly increasing function of 

V + 1 — P{xv-1 > C } 

and this latter is rio[S(v , c)]. If we choose v c' so that 
no(S') < ✓ + 1 - Plxl-i > c'} 


(40) 


< v + l -P{xl-i > c]. 


it follows that 


(41) a[S(p', c')] < a[S(v, e)] = a(S'). 

But (41) andjthe first part of (40) contradict the result expressed in (34) and 
(36). 


REFERENCES 

[11 A. Wald, Sequential Analysis, John Wiley and Sons, 1947, section 11.2. 

[2] Charles Stein, “A two-sample test for a linear hypothesis whose power is independent 
of the variance”, Annals of Math. StaL, Vol. 16 (1945), pp. 243-258. 

[31 G. B. Dantzig, “On the non-existence of tests of ‘Student’s’ hypothesis having power 
functions independent of o-”, Annals of Math. Stat., Vol. 11 (1940), p. 186. 



NOTES 

This section is devoted to brief research and expository articles on methodology 
and other short items. 

A USEFUL CONVERGENCE THEOREM FOR 
PROBABILITY DISTRIBUTIONS 

By Henry Scheff£ 

University of California at Los Angeles 

In problems of establishing limiting distributions it is often apparent that the 
probability density p n (x) of a random variable X n has a limit p(x) ; throughout 
this paper n = 1, 2, 3, • • • , and all limits are taken as n —► If p(x) is the 
density of a random variable X y what we really care about then is whether the 
limits apply to probabilities, which involve integrals of the densities: Does 
lim Pr{X n in S} = Pr{X in S} for all 1 Borel sets S , or, does 

(1) lim / p n (x) dx = p(x) dx ? 

The question is thus one of taking a limit under an integral sign. Perhaps the 
most widely used justification of such a process is the following theorem of 
Lebesgue [1, p. 47; 2, p. 29]: If for a sequence \f n (x)} of integrable functions, 
lim f n (x) = f(x) for almost all x in S, then a sufficient condition that 

lim f f n (x)dx = f f(x) dx 
Js J& 

is that there exist an integrable function g(x) which uniformly dominates the 

sequence {/ n (x)}, that is, |/ n (z) | < g{x) for all n and alls in S, and / g(x)dx<*>. 

Js 

For example, in the excellent new treatise by Cramer the limitij g form of the 
^-distribution is treated as follows [1, p. 252; other examples < n pp. 369, 
371]: For n degrees of freedom the 2-variable has the density 

(2) Pn(x) = Cn(l + X 2 / n)" i(n+1) , 
where 

(3) c„ = (nir)“ i r(Kn + l))/T(Jn). 

It is shown fairly easily that lim p n (x) = p(x), the density of N(0/ 1), where 

1 In defining the convergence of a sequence of distributions to the distribution of a dis¬ 
continuous random variable X it is desirable to modify this requirement so that it is de¬ 
manded only of sets S which are continuity intervals of X [1, p. 83]. We are concerned here 
however only with the 4 ‘ absolutely continuous case M where X has a probability density p{x). 


434 



A CONVERGENCE THEOREM 


435 


N(m, a 2 ) denotes the normal distribution with mean m and variance 
to prove 


lim 



dx 



2 

<T . 


Then 


Cramer shows that {p»(a;)} is uniformly dominated by an integrable function. 
It is instructive to consider some examples where 



does not equal 

(5) f lim p n (x) dx. 

J— 00 


In the examples (i), (ii), (in), lim p n (x) = 0 for all x and hence (5) is zero for 
all £. 

(i) p n (x) = 1 for — l < x <—n, zero elsewhere. Then (4) equals 1 for all £. 

(ii) p n (x) = l/n for —\n<x< zero elsewhere. Here (4) equals Jfor all 

f. 

(in) p n (x) = 2 n 2 x for 0 < x < l/n, zero elsewhere. Now (4) is zero for 
£ < 0, unity for £ > 0. 

An example in which lim p n (x) ^ 0 is 

(w) Pn ( x ) = %[h n (x) + po(x)], where h n is the p n of one of the above examples 
and po is a fixed density. Then lim p n (x) = %po(x). Now (4) exceeds (5) by 
half the amount it did in the corresponding above example. 

The essential features of these examples could be obtained with normal 
distributions but would involve a little more computation, for instance, N( —n, 1), 
N(0, n 2 ), N(l/n, l/n 4 ), for examples (i), (ii), (in), respectively. 

We note that in none of these examples is lim p n (x) a density. This suggests 
that the trouble might perhaps be prevented by requiring that lim p n (x) be a 
density—which happens in the case from which we started. This surmise is 
correct. We may formalize the situation as follows: 

Definition. A function f(x) mil be called a density if it is non-negative and 

J f(x) dx = 1. Here R denotes the whole space of x. 

R 

The reader may think of a univariate density, where £ is a real variable and 
R is the real axis, but theorem and proof run the same for a fc-variate density, 
where a; is a point in a ^-dimensional Euclidean space R. 

Theorem 2 . If for a sequence {p n (x)\ of densities 

lim p n (x) = p(x) 


* The hypotheses of this theorem, while perfectly adapted to applications in probability 
and statistics, would not seem the “natural" ones in real variable or measure theory. Pro¬ 
fessor A. P. Morse has remarked to the writer that, if the theorem has not been stated in this 
form before, it is at least an easy corollary of some more general results known in that field. 
Nevertheless our direct proof based only on the familiar Lebesgue theorem and using only 



436 


HENRY SCHEFF^ 


for almost all z in R , then a sufficient condition that 

lim / p n (x) dz = p(x) dz, 

Js Js 

uniformly for all Borel sets S in R , is that p(z) he a density . 

Proof. Let us write the difference 
(6) p n (z) - p(z) = 6 n (z). 

Then 


(7) 


6„(a;) —> 0 


for almost all z in R. Also 
( 8 ) [ 8 n dx 

and so it suffices to prove that 
henceforth denotes a Borel set. 


= p„ dx - p dx, 

Js Js 

/ S n dx —> 0 uniformly for all S in R , where S 
Js 

If in (8) we let S = R we get 


(9) f b n dz = 0 

Jr 

since p n and p are densities. We now split the difference 8 n (z) into its positive 
and negative parts: Let 

(10) = K«n + | *» | ), Sn = h(Sn - I 5„ I ), 
so that 

Sn = Si + S~, 6i >0, S~ < 0 . 

Frond (7) and (10), we find 

(11) s; -► 0 

for almost all x in R, and from (9), 

(12) f St dx+ f S~dx =0. 

Jr Jr 


very simple manipulations may be of interest to readers of the Annals, Professor Morse 

also pointed out that the stronger result lim / | p n (x) — p(x) | dx — 0 uniformly for all 5, 

Js 

may be stated. This follows from our proof since 


/ I Pn - V | dx - [ BZ dx 
Js Js 



dx . 



A CONVERGENCE THEOREM 


437 


By virtue of (6), 6, > -p. Now if 6„ < 0, 67 = 6. > -p, and if 6. > 0, 67 = 

0 > — p, and hence in every case 0 > 67 > -p. Since we now have | 67 (x) | < 

p(x) and / p(x) dx - 1, we may apply 3 the Lebesgue theorem to get 
Jr 

lim / 57 dx = / lim 57 dx. 

Jr Jr 

The right member is zero because of (11). It then follows from (12) that 

lim / St dx is also zero. The relations 
Jr 

0 < f St dx < f stdx~>0, 

Js Jr 

0 > f Sn dx > f Sndx->0 
Js Jr 

guarantee that the quantities dt dx and / 57 dx have the limit zero uniformly 
# Js Js 

for all S f and hence the same is true of their sum (8). 

Returning to the example (2), we remark that it is practically obvious that the 
second factor on the right has the limit e~ ix \ but it is not quite so obvious that 
lim c n = (27r)“*. This situation is typical of many applications where it is 
more difficult to evaluate the limit of “the ,, constant than the limit of the re¬ 
maining factors, and one wonders after obtaining the latter limit whether the 
constant is not automatically forced toward the limit desired for it, and whether 
the direct calculation of its limit could not be avoided. Let us put the question 
as follows: Suppose that 

\Vn(x) = Cnf n {x)} 

is a sequence of densities and that 

p(x) = cf(x) 

is also a density. Then if lim f n (x) = fix) for almost all x , may we conclude 
that lim c n = c? If so, we could then apply the above theorem without having 
evaluated the limit of the constant or produced a dominating function. Un¬ 
fortunately the answer to this question is no, as shown by example ( iv ) above: 

1 Although our proof rests on the Lebesgue convergence theorem, this theorem is applied 
nto 5(a;) and not to p n (x). While in most cases of practical interest the sequence {pn(z)) 
is uniformly dominated by an integrable function, it is possible to devise a simple example 
where this is not true and yet our theorem applies: Let p n (x ) ** 1 for l/(n + 1) < x < 1 

n 

and for a» < x < o»+i, zero elswhere, where a n * 2 1 /*. Then sup p n (x) — 1 for 

t—ii 

all x > 0, nevertheless lim p.(x) is a density, namely that of the uniform distribution on 
(0, 1). 



438 


M. KAC AND A. J. F. SIEGEltT 


If we let/ n (a;) = h n (x ) + po(x) y and f(x) = po(x), then lim f n (x) = f(x) y but 
c n = \ and c = 1, hence lim c n ^ c. Employing the assumption that p n (x) 
and p(x) are densities we see 

l/c n = f fnWdx, l/c 

, Jr 

and hence lim c„ = c if and only if 

(13) lim [ f n {x) dx = [ 

Jr Jr 

It follows that in such cases if we wish to establish a limiting distribution in the 
sense (1), we may either prove lim c» = c, or we may justify (13), say by produ¬ 
cing a suitable dominating function, but we need not do both. No doubt the 
first alternative would be preferable at all but the most advanced levels of 
teaching or exposition. 


= f f(x) dx, 
Jr 

lim f n (x) dx. 


REFERENCES 

[1] H. Cramer, The Mathematical Methods of Statistics , Princeton Univ. Press, 1946. 
J2] S. Saks, Theory of the Integral , Stechert, New York, 1937. 


AN EXPLICIT REPRESENTATION OF A STATIONARY 
GAUSSIAN PROCESS 

By M. Kac 1 and A. J. F. Siegert 

Cornell University and Syracuse University 

1. In a paper which will soon appear in the Journal of Applied Physics [1] 
the authors have introduced methods of calculating certain probability dis¬ 
tributions which are of importance in the theory of random noise in radio re¬ 
ceivers. 

The complexity of the physical problem and occasional uses of heuristic reason¬ 
ings may have obscured some of the mathematical points. For this reason the 
authors felt that it may be worth while to illustrate one of the basic ideas on a 
simple but important example. 

2. A stationary Gaussian process is a one parameter family x(t) of random 
variables such that: 

(a) . x(t) is normally distributed; the mean and the variance being inde¬ 
pendent of t 

(b) . the joint probability distribution of x(ti), x(tf) y • • • , x(l r ) is multivariate 
Gaussian whose parameters depend only on the differences tj — tk . 


1 John Simon Guggenheim Memorial Fellow. 



STATIONARY GAUSSIAN PROCESS 


439 


We assume, for the sake of simplicity, that the process is normalized, i.e., 
E\x(t) | = 0, E{x\t) J = 1 

and we define the correlation function p(r) by the usual formula 

p(r) = E{x(t)x(t + r)). 

It is then well known 2 that a distribution function a(u) exists such that for all r 


( 1 ) 


p(r) = / co aurda(u). 
J—9 0 


3. Let 0 < 8, t < T and consider the symmetric kernel 

K(s, t) = p(s - t ). 

The fact that <r(u) is non-decreasing implies that the kernel p(s — t) is quasi- 
definite, i.e., for every L 2 function g(t) on (0, T) one has 


f [ g(s)p(s - t)g(t) ds dt > 0. 
J o Jo 


Thus the eigenvalues of the integral equation 

(2) C p(s- l)f(t) dt = \f(s ) 

are non-negative. Moreover, denoting by A, the eigenvalues and by /,(£) the 
corresponding normalized eigenfunctions of (2) we have by the classical theorem 
of Mercer (see [4], in particular part 6 of Ch. I) that 

(3) p(s - I) = Z! 

J 

where the series on the right is absolutely and uniformly convergent. It should 
be noted that in virtue of (1) p(r) is a continuous function. 

4 . Let now G\ , G 2 , G$ , • • • be independent, normally distributed random 
variables each having mean 0 and variance 1. 

Consider the series 

(4) £ 

J 

Since for each t we have 

£ (VV/W’ = £ \fi (0 = p( o) = i, 

j i 

we infer that for each t the series (4) converges in the mean to a random variable 
x{t). Moreover, by a theorem of Kolmogoroff [5], the series (4) converges, for 
each t , to x(t) with probability 1. 


* See 12]. The theorem in question (in a somewhat different form) seems to have been 
first established by N. Wiener in [3]. 



440 


M. KAC AND A J. F. SIEGERT 


Thus we may write 

(5) x(t) = £ V\G,f,{t). 

J 

It is now easy to show that x{t) thus defined is a stationary Gaussian process 
in (0, T) with the correlation function p(r). 

In fact, 

#{.t(s)a.(/)} = E A,/,(«)/,(/) = p(s - t),0 < s, I < T, 

1 

and conditions (a) and (b) of section 2 follow from the well known properties of 
linear combinations of independent Gaussian random variables. Of course, 
we are dealing here with infinite linear combinations but the mean convergence 
noted above, is sufficient to justify the extension to our case. 

5 . It is more illuminating to think of the random variables G, as measurable 
functions G ; (w) defined on an abstract set ft in which a Lebesgue measure has 
been established (the measure of the whole space being 1). 

The representation (5) can then be written in the equivalent form 

(o) 

J 

The equality, as established in section 4, holds for every t in the sense of mean 
convergence. Moreover, by the theorem of Kolmogoroff cited above, and by 
Fubini’s theorem the equality (6) holds for almost every pair ( t , a>), (0 < t < T), 
in the sense of ordinary convergence. 

Furthermore by Mercer‘s theorem (remember that X, > 0) 

r T 

\ = I p($ — s) ds = T 

j Jo 

and hence 

E A ; £{61) = E x, [ G)(o>) = E x, = T < oo. 

3 J J 0 J 

Thus 

j 

converges for almost every « and therefore the series 

(7) E 

j 

converges in the mean for almost every w. 

Combining this fact with the observation that (7) converges almost every¬ 
where to x(t , «) we see that, for almost every o>, the series (7) converges in 
the mean to x(t , co) and that consequently 



STATIONARY GAUSSIAN PROCESS 


441 


(8) f x%<*)dt = Z \G 2 M 

J o j 

for almost every w. 

It should be noted that (8) could not, in general, be derived by just appealing 
to Parseval’s relation. The main reason is that Parseval’s relation holds only 
for complete orthonormal systems whereas the orthonormal system {/n(01 of 
eigenfunctions may fail to be complete. If the kernel p(s — t) is positive- 
definite (in which case all the eigenvalues are positive instead of just non-nega¬ 
tive) then it is known that the eigenfunctions form a complete set. This actu¬ 
ally, happens to be the case in most physical applications. 

6. An important application of (8) is the calculation of the characteristic 
function of the distribution function of the random variable 

(9) I = f x 2 (t , w) dt. 

In fact, 

(10) 2?{exp (i{7) } = n £{exp (tfX,<?*) = II (1 - 

i i 

The probability density of / is the Fourier integral 

2 " I* exp (-if 7) II (1 - i{X,) -1 rff 

which, unfortunately, in most cases cannot be calculated explicitely. If 

P(r) = e~ 0>Tl , 

in which case the process is also Markoffian, the eigenvalues X,- can be cal¬ 
culated explicitly 3 but in more complicated cases it is quite difficult to deter¬ 
mine them. 

7. If p(r) is absolutely integrable and <r(/x) absolutely continuous then, setting 

A(u) = a'(u), 

we have A(u) >0 and 

p (t) = r coa utA(u) du = f°°e <UT B(u) du, B(u) ^(u) + 

J—OO J— 00 

8 See 16], in particular section 4. We take this opportunity to correct two misprints in 
this note. In the last formula on p. 64 M should be replaced by N. Also the limits of 
integration in formula (6) should be 0, 8 and s, p + q instead of 0, p + Q and 0, p ■+■ q. 

The N.D.R.C. Report 14-305 to which a reference is made has been declassified in the 
meantime. It contains results which originated both [1] and the present note. 

* These and related results were stated in the abstract [7] by M. Kac. The paper is now 
being prepared for publication. 



442 


E. N. OBERG 


It can then be shown 4 that 

lim = 27r f B 2 (u) du = f p 2 (r) dr 

T-* OO I ] v~00 V— 00 

and 

lim ~ = (27r) 2 f B 3 (u ) dw. 

T—*ec 1 j J— oo 

It follows now by standard methods that the characteristic function of 



approaches, as T <*>, 

«* (-§■**)’ 

where 

a 2 = r p 2 (r) dr. 

J — 00 

Thus, as T -> oo, the distribution of (11) becomes normal with mean 0 and 
variance <A 

REFERENCES 

[1] M. Kac and A. J. F. Siegeht, “On the theory of random noise in radio receivers with 

square law detectors. 1 * To appear in Jour, of Applied Physics. 

[2] A. Khintchine, “ Korrelationstheorie der stationaren stochastischen Prozesse,” Math. 

Ann., Vol. 109 (1934), pp. 604-615. 

[3] N. Wiener, “Generalized harmonic analysis," Acta Math., Vol. 55 (1930), pp. 117-258. 

[4] G. Hamel, Integralgleichungen, Julius Springer, Berlin, 1937. 

[5] A. Kolmogoroff, “fiber die Summen durch den Zufall bestimmter unabhiingiger 

Grossen," Math. Ann., Vol. 99 (1928), pp. 309-319. 

16] M. Kac, “Random walk in the presence of absorbing barriers," Annals of Math. Stat., 
Vol. 16 (1945), pp. 62-67. 

[7] M. Kac, “Distribution of eigenvalues of certain integral equations with an application 
to roots of Bessel functions,** Abstract, Bull. Amer. Math. Soc. y Vol. 52 (1946), 

pp. 65-66. 


APPROXIMATE FORMULAS FOR THE RADII OF CIRCLES 
WHICH INCLUDE A SPECIFIED FRACTION OF A 
NORMAL BIVARIATE DISTRIBUTION 

By E. N. Oberg 
University of Iowa 

1. Introduction. Given the normal bivariate error distribution 
0) y) = (1/2tc*< r v )e _< * ,,4, * + * 4/2 '* > . 

The purpose of this paper is to present certain approximate formulas for the 
radii of circles whose centers are at the origin, which include a prescribed pro¬ 
portion, p, of errors. The formulas are, for given <r x , a v , and p, 



APPROXIMATE FORMULAS 


443 


(2) 

R\ = V2 <r x (Ty In (1/[1 — p]) 

(3) 

R * = + al) In (1/[1 - p\) 

and 


(4) 

Ri = {ox + <r,)V( 1/2) In (1/[1 — p]). 


In section 3 we present tables of p', the true proportion of errors contained in 
circles whose radii are given by the above formulas. These tables reflect the 
goodness of approximation of each formula to the true radius, R> for 0.1 g p g 
0.9 and 0.5 ^ cr x /a v ^ 0.9. Also, a brief statement is included for the same range 
of p but with 0.1 ^ a x /Gy ^ • 4. 

2. The derivation of the formulas. The proportion p of errors that fall 
within an area A on the zp-plane is given by 

(5) p = J^<p(x,y) dA. 

If the area is bounded by any member of the family of elipses 

2/2 , 2/2 x 2 

* A* + y !°v = x, 

the above integral may be evaluated directly. The result is 

P = l - e , 

whence 

X 2 = 21n(l/[l - p]). 

Thus the ellipse with semi-axes 

(6) <TxV 2 In (l/[l - p]), <ryx/2 In (1/[1 - p]), 

measured from the origin along the x and y axes respectively, will include ex¬ 
actly the prescribed proportion of errors. 

Frequently, however, it is desired to know which circles rather than which 
ellipses include a certain proportion of the errors. In this case it becomes 
difficult to obtain a formula for the true radius from (5) unless <r* = <r y in which 
case R is given by either one of the formulas in (6). However, a natural ap¬ 
proximation to make is to equate the area of a circle of radius, say R , to the area 
of the ellipse whose semi-axes are given in (6). This gives formula (2), 

Ri = V2 a x <T v In (1/[1 — p]), 

which can be expected to give a fairly close approximation to true R if <r 9 is 
close to <r v . If cr x j* cr v , it has been shown that this formula underestimates 
true R which is undesirable in some applications [1]. That is, if is used to 
estimate, say the radius of a circle to include 50% of the errors (p == .5), it will 
give a value which includes less than the desired proportion. The first table in 
the last section gives a numerical verification of this fact. 



444 


E. N. OBEKG 


To obtain formula (3) we consider formula (5) when A is a circle of radius R. 
We have 

r R r\Zx*-x* 

V = 4 / / <p(x, y) dy dx. 

Jo Jo 

By making the transformation x = a x r cos 6,y = <r v r sin 0, and by carrying out 
the integration with respect to r the above formula becomes 

p = 1 - (2/tt) f 2 to, 

Jo 


We let 


and 


a = R 2 /(*l + 4), P = W - *»/(*$ + 4), 

O ’*/(Ty = € J O’* ^ (Ty • 


Then 

a = R 2 /al(l + € 2 ), and p = (1 — e 2 )/(l + e 2 ), which is less than unity. 
This substitution will be helpful later in preparing tables. The fact that <r x 
is taken less than <r v places no limitation on the final results since we only have 
to interchange axes in the other case. The above integral may now be written 

as 


/•W 2 


(7) 


p - 1 - (2/») l 

pTl 2 

= 1 - (2/ ir)e~ a / e 
Jo 


—a0/(l—/3cos20) 




-a^oo 8 29/ (1 -0co a 20) 


tf0. 


The integrand, say F(0), in the last integral of (7) can be shown to be monotone 
increasing from e~ a$n ~ p to e° m+ ^ as 0 varies from 0 to t/ 2 . Furthermore, it crosses 
the line F(0) = 1 somewhere in this interval and differs but little from it any¬ 
where if the ratio <r*/or y is close to 1, since 0 is then close to zero. If, therefore, 
we replace the integrand by F($) = 1, we have p = 1 — e~ a . Hence, if a is 
replaced by R 2 /(<tI + <r 2 y ) and the result solved for R, we have formula (3), 


Ro = V(d + 4) In Cl/Ll - pi). 

Finally, formula (4), 

Rs = {<r* + OV(i) In (1/[1 - pj), 

is obtained by taking the root-mean-square of the former two. This formula 
has certain advantages over the other two, the most obvious being that cr x and 
a y enter linearly so that it is simple to evaluate for given <r x , <r v , and p. Sec¬ 
ondly it will be seen by the tables and additional comments made in the last 
section that when p = 0.5, 1 Rz overestimates true R by a slight amount for all 


‘This particular value of p gives the circular probable error. In this case Ri «• 
0.5887(v. + a,). 



APPROXIMATE FORMULAS 


445 


values of <r x /<r v , and it gives a fairly close approximation to true R for all p 
when a x /cr v ^ 0.5. 

We close this section by making a few brief comments. In the first place, 
if a ny of the ab ove formulas is to be computed from a sample of data, we take 
V2x 2 /(n — 1) and \/s y-/(n — 1) as estimates of a x and a v respectively. Fur¬ 


thermore, we test the significance of these statistics by known formulas [2]. 
Finally, a x and a v may be replaced by D x and D Vi where D x is the 
population mean deviation. Thus, for example, 


= (D x + D v ) 




3. Tables. The first formula in (7) is useful in testing by means of numerical 
integration the goodness of approximation of the formulas Ri , R 2 , and tf 3 to 


TABLE I 


p computed by means of formula R\ 


\ 

.1 

.2 

: 

.25 

.3 

.4 

.5 

.6 

.7 

.75 

.8 | 

.9 

.5 

.0988 

.1951 

.2425 

.2893 

.3815 

.4720 

.5615 

.6508 

.6960 

.7422 

.8408 

.6 

.0944 

. 1974 

.2459 

.2942 

.3899 

.4846 

.5786 

.6726 

.7198 

.7676 

.8668 

.7 

.0997 

.1987 

.2480 

.2972 

.3950 

.4924 

.5894 

.6864 

.7350 

.7838 

.8835 

.8 

.0999 

.1995 

.2492 

.2989 

.3981 

.4970 

.5958 

.6946 

.7440 

.7936 

.8935 

.9 

.1000 

.1999 

.2498 

.2997 

.3996 

.4993 

.5991 

.6988 

.7483 

.7986 

.8985 

1.0 

.1000 

.2000 

.2500 

.3000 

.4000 

.5000 

.6000 

.7000 

.7500 

.8000 

.9000 


the true value of R. We construct the tables by replacing R in a by one of these 
formulas, say formula Ri . This gives a = [2 e /(I + c 2 )][l/(l — p)]. Since 
0 = (1 — c 2 )/(l + e 2 ), the right hand side of the formula in (7) may then be 
evaluated for a choice of e and p giving a value we denote by p'. This is the 
actual proportion of errors that is included in the circle whose radius is R \. 
If R\ gave true R } then p' would be equal to p, so we may regard the difference 
of p and p' as a measure of the error arising when Ri is used to estimate R. 

In the following tables the chosen values of p and e = <r x /<r y are listed in the 
first row and column respectively. The remainder of the tables include the 
corresponding values of p'. 

We also have computed tables for 0.1 ^ cr x /a y g 0.4 which we have not in¬ 
cluded in this paper since for this range of values of a x /a v , all of the formulas 
give approximations that depart considerably from true R except when p = 
0.5. For this case, p' = .4776, .5004, .5109, and .5120 when <r x /a y = 0.1, 0.2, 
0.3, and 0.4 respectively. 



446 


E. N. OBERG 


The difference between an entry in a column and the corresponding value 
of p at the head of the column reflects the error in estimating true R by means of 
Ri , R 2 , and Rz . For example, if p is chosen as .5 and <u/<r y = *7 then ^3 
gives the radius of a circle which includes 50.13% of the errors. Thus Rz 
overestimates true R by including .13% more of the errors. 

By examining the tables it is seen that when 0.1 ^ p ^ 0.3, R\ gives the best 
approximation to the true value of R } while R» gives the poorest. If 0.4 ^ p S 

TABLE II 


p computed by means of formula R 2 


, / 
V' 

/ 

.1 

1 

l 

•2 

.25 

.3 

.4 

! 

.5 

.6 

.7 

.75 

.8 

.9 

.5 

.1215 

.23(53 

.2912 

.3446 

.4467 

.5432 

.6346 

.7217 

.7641 

.8060 

.8907 

.6 

.1116 

.2202 

.2732 

.3255 

.4274 

.5261 

.6218 

.7146 

.7600 

.8050 

.8949 

.7 

.1057 

.2100 

.2616 

.3127 

.4140 

.5136 

.6116 

.7081 

.7558 

.8032 

.8976 

.8 

. 1022 

.2039 

.2546 

.3051 

.4056 

-5055 

.6048 

.7034 

.7525 

.8014 

.8991 

.9 

.1005 

.2009 

.2509 

.3012 

.4013 

.5012 . 0011 

.7008 

.7506 

.8003 

.8999 

1.0 

.1000 

.2000 

.2500 

.3000 

.4000 

.50001.6000 

.7000 

.7500 

.8000 

.9000 


TABLE III 

p computed by means of formula Rz 


\ 

.1 

.2 

.25 

.3 

.4 

•5 

i 

.6 

• 7 

.75 

.8 

.9 

.5 

.1102 

.2161 

.2674 

.3176 

.4152 

.5092 

.6001 

.6887 

.7327 

.7768 

.8694 

.6 

.1056 

.2089 

.2597 

.3100 

.4090 

.5059 

.6009 

.6944 

.7408 

.7872 

.8817 

.7 

.1027 

.2044 

.2548 

.3050 

.4046 

.5031 

.6007 

.6974 

.7456 

.7937 

.8908 

.8 

. 1011 

.2017 

.2519 

.3020 

.4018 

.5013 

.6003 

.6991 

.7483 

.7976 

.8963 

.9 

1.1003 

.2004 

.2504 

.3004 

.4004 

.5003 

.(>001 

.6998 

.7496 

.7995 

.8992 

1.0 

.1000 

.2000 

.2500 

.3000 

.4000 

.,5000 

.6000 

.7000 

.7500 

.8000 

.9000 


0.75, Rb gives the best and R 2 the poorest; and if 0.8 ^ p ^ 0.9 R 2 gives the best 
and Ri the poorest. Thus formula Rz for general use gives the best overall 
approximation. It may be remarked at this point that bounds for the true 
value of R can be found by applying two of the formulas, one of which over¬ 
estimates while the other underestimates R. From the tables it is apparent that 
this can be done for values of p ^ 0.8. 

Finally, these formulas may be used to test roughly the normality of the data. 
For example, if proper estimates 2 of a, and o y are made from the data, and the 


* See section 2. 



EFFICIENCY OF SEQUENTIAL TEST 


447 


corresponding value of R 3 computed for a chosen p, then approximately, the 
proportion p' of plotted errors should fall within the circle of radius Rz . 

REFERENCES 

[1] Henry Scheff£, Armor and Ordinance Report No. A-224, OSRD No. 1918, Div. 2, 

pp. 60-61. 

[2] S. S. Wilks, Mathematical Statistics , Princeton Univ. Press, 1943, p. 131. 


A NOTE ON THE EFFICIENCY OF THE WALD SEQUENTIAL TEST 

By Edward Paulson 


Institute of Statistics , University of North Carolina 


The sequential likelihood ratio test of Wald for testing the hypothesis H 0 
that the probability density function is /(X, do) against the one-sided alternative 
Hi that the function is /(X, di) has been shown [1] to have the optimum property 
of minimizing the expected number of observations at the two points 0 = 0 O 
and 0 = 0i. Tables showing the actual magnitude of the percentage saving 
of this sequential procedure compared with the classical “best” non-sequential 
test have been calculated (see [1], page 147) for the normal case when 


/(X, 0) 


v£ cxp 


-(x - ey 

2 


In this note we will show that when 0i is close to 0 O , the percentage saving is 
independent of the particular function /(X, 0) and the particular values 0i 
and 0o, so that the tables mentioned above can be used to show the percentage 
saving for any one-sided sequential test involving a single parameter, provided 
/(X, 0) satisfies some weak restrictions. 

Let /(X, 0) be the probability density function of a random variable. Let 
Ei(n ) denote the expected value (when 0 = 0,) of the number of independent 
observations required by the Wald sequential procedure to test the hypothesis 
Hq that 0 = 0o against 0 = 0i = 0 O + A with probabilities a of rejecting Ho 
when 0 = 0 O and /3 of accepting H 0 when 0 = 0i . Let N be the number of in¬ 
dependent observations required to achieve the same probabilities a and 
for testing the hypothesis 0 = 0o against 0 = 0i by the most powerful non¬ 
sequential test. Let U a and Up be defined by the relations 

^ r.“ p Hi * 



and 



448 


EDWARD PAULSON 


We will prove the following theorem: 

Limit (&<">! „ {* '° 8 + (1 ~ 108 (j~^~s)} 

a. 7,^0 l AT / “ (Ua+ Utf 

provided }{X, 6) satisfies the following conditions: 

W) / f(X, 0) dx can be differentiated twice under the integral sign with respect 

to e. 

(B) All four of the integrals 

£j£$ r(M ' ) <fa> 

CftS*-** 


are continuous functions of 9* at 0* = 0o . A sufficient condition for (B) is that 
all the integrals be uniformly convergent with respect to 0* in some interval 9 0 < 
0* < do + A, and all the integrands be continuous functions of X and 0 *. A 

similar theorem holds regarding the limit of 

A-0 

The proof is as follows: From [1], we know that 



E 0 (n) = 


a ,o g (L_?) + (i _« )io g (r4-) 

Eo(z) 


+ 0 ( 1 )> 


where 


and o(l) —> 0 as A —> 0. 



Now 

- £[log d^)]/(*.« *. 

= f [log/(x, 00 + A)]/(x, do) dx - [ [log/(x, 0o)]/(x, do) dx. 

J—00 J—an 



EFFICIENCY OF SEQUENTIAL TEST 


449 


Expanding log /(x, 0 o + A) in a Taylor series about A = 0, we have 


log/(x, 0o + A) = log /(*, 00) + A 


+ |«i, 

M 0 l 


where 


and 


00 < e* < 00 + A, f = , f" = , 


■ 

*0 


* - 

From assumption (A) we find that 

£ 00 -00 

f'(x , 0 O ) dx = 0 and / /"(x, 0 O ) dx = 0, 

oo J—oo 

while from assumption (B) 

f Rif(x, do) dx —► 0 as A —> 0, 

J—OO 

Therefore 

To find A for the most powerful non-sequential test, we make use of the fact 
(see [2]) that an asymptotically most powerful test for one-sided alternatives is 
given by a region of the type 

tt _ — *> ^ ir 

Un ~ VNhJM) ~ • 

When A —> 0, N —> °o, and since is the sum of A independent variates with 

jj _ E(U ) 

a finite second moment, the distribution of —- \-JLi approaches that of a 

Vv 

normal variate with zero mean and unit variance. Hence we find the N re¬ 
quired for a test with Type I and Type II errors a and 0 by solving for N from 
the relations 

* -V. 


( 1 ) 

and 


K-Vn El (tj 


M0L-WSU 


- -u, 


( 2 ) 



450 

C. TRUESDELL 

Now let y = (4) > and we find from (1) and (2) that 

N = 

r UaVEoW) + U, VEM) - [£,(2/)* 

L EJy) 

Now 


F M _ fVfe do) 
R(y) I./(*,#,) 

fix, 81 ) dx 


J- 


- 4 £ 7KB #>) * + 4 £ 7KB ^ #) « * 

= A^o 2/ 2 [1 + o(l)] from assumption 


Proceeding in a similar manner, we find 

[UaVEoiyy + U fi y/Ei(y ) 2 — [l?i(y)] 2 ] 2 = ^o(z/ 2 ) [t/« + U fi ( 1 + o(l))] 2 . 


We now have 

.»(»)_ ilswi’a + 0(1))’ ‘* log (£r j ) + (1 

N _ W)[ </. +W1+<.<!»]• X 

therefore 


limit 

A —0 



| a log 

( a ) + (f ~ «) log ! 

f.Ml 

(U a + U,)* 


REFERENCES 

[1] A. Wald, “Sequential tests of statistical hypotheses”, Annals of Math. Stat., Vol. 
16 (1945). 

[2] A. Wald, “Some examples of asymptotically most powerful tests”, Annals of Math. 
Stat., Vol. 12 (1941). 


A NOTE ON THE POISSON-CHARLIER 1 
FUNCTIONS 

By C. Truesdell 
Naval Ordnance Laboratory 
The polynomials p n (m, z ) given by the definition 

(1) Pn (m,:) S (-reV*^[r 8 ”J, 

1 This note was written while the author was employed by the Radiation Laboratory, 
M.I.T. 



POISSON-CHARLIER FUNCTIONS 


451 


called the Poisson-Charlier polynomials, and the associated function ^»(m, z) 
given by the definition 

(2) iAn(m, z) = p»(m, z)\fro(m, 2 ), 

(3) i£o(m, z ) m P — ~ , 

ml 

occur in statistics. Doetsch [1] has devoted a memoir to them, and they are 
noticed in Szego’s Orthogonal Polynomials (pp. 33-34). 

I suggest that they are most directly and easily studied in connection with 
the “F-equation” 

(4) }F(z, a) = F(z, a + 1), 

dz 

whose properties and application to various special functions I have sum¬ 
marized in a recent note [2], Using the theorems of that note, which I shall 
cite by number, I shall now generalize the Poisson-Charlier polynomials and 
sketch the speediest derivation of their most interesting formal properties. 

Greek letters shall represent unrestricted real numbers, while Latin letters 
shall represent integers. 

From the existence theorem for the F-equation (Theorem 4) we know that 
there exists an integral function of 2 , F^z, a), which satisfies the F-equation 
and the condition 

(5) Ffi( 0, a) = cos(o: + 0)r • 

From the uniqueness theorem for the F-equation (Theorem 4) it follows that 

(6) Fp(z, n — 1 3 + £) = 0, 

(7) Fp(z, n) = 0, n > 0. 

From the general power series solution for the F-equation (Theorem 4) we have 
the formula 

-a)i^i( aJ £ + a + 1> *)• 

We now define the Poisson-Charlier functions in general by the formulas 

(9) pt(a, z) m r(« + l)z~ a Fp(z, -a), 

(10) M<*, z) ■ Pfi(a, 2 ). 

From the formulas (6) and (7) we see that [1, p. 263] 

(11) M~n,z) = 0, n > 0; 


( 8 ) 


F/j(z, a) = cos (a + 


( 



452 


C. TRUESDELL 


(12) M- n + P - i z) = 0, P/s(~ n + 0 - J, *) = 0. 

From the formula (8) we see that 

(13) z) = cos (/3 — a)7r — a + 1;*)» 

whence it follows at once that 

(14) pp(m y z) = cos 07r 53 ^ k\{ — z)~ k . 

This is the usual explicit expression for the Charlier polynomials [1, p. 257]. 
From formula (13) we see that 

(15) po(—a, z) = r(l - «)A(«, *). 

In the indeterminate case when a is a negative integer we see from the formula 

(14) that 

(16) p 0 (m , z) — 1, m ^ 0. 

Hence 

(17) ^o( — a, z) = e“Va:, z), 

(18) ifo(m, a) = . 

From the definition (10) we now see that 

(19) yp&(m y z) = z)t o(w, *), 

a generalization of the formula (2). From the formula (13) and the definition 
(10) we see that 

(20) ik,(0, z) = cos (0 - a ) 7r r ( / 3 + + 

Then by Kummer’s first transformation, 

(21) tattyz) = cos(0- «)*■ p - q; j) iW« + i;«-^ + i;-*)i 

from which it follows from the power series formula for solutions of the F-equa- 
tion (Theorem 4) that ^«(0, z) is a solution of the F-equation (4). 

We now have two different solutions of the F-equation based on the Poisson- 
Charlier functions: 


(A) 


F(z, a) = eV/s(-a, z). 



POISSON-CHAKLIEK FUNCTIONS 


453 


(B) F(z, a) = + a (p, z). 

From the F-equation it is evident that 

(22) = ^>0?,*), 

whence we at once deduce the formula (1). Applying Taylor’s theorem for the 
F-equation (Theorem 8) to the solution (B) we see that [1, p. 259] 

(23) ta<J3, z + h) = £ ^ t a+n (0, z); 

n— o n\ 

putting a equal to zero we find that 

(24) sin2/Jir z + h ) = ± *„(£, *), 

AT n-0 7l\ 

and, more specially [1, p. 260] 

(25) 

Applying the same theorem to the solution (A) we obtain the formula 

(26) e h \'l'e(ot, z + h) = ]C yp»(a — n, z), 

n -0 ni 

whence we recover the formula (11) by putting a equal to zero. 

Applying Theorem 9 to the solution (B) yields the result 

(27) £ 0.+ n (8, z) = [ e"Va(8, 2 + «) d9, 

ItmmO JO 

which contains as a special case the formula 

(28) £ t n p„ (m, z) = (1 + Q m e' <1+1,<) ! - y (m + 1, 2 (l + j))]. 

Appell’s generating expansion (see Theorem 10, part C or [3, p. 120]) applied 
to the solution (A) yields the result 


(29) 
hence 

(30) 


£ h(n, z + y)t" = e ,( ‘ w £ Mn, y)t n \ 


E —j Pfi n > z y) — e < ** ,,< * + * ) 

n-0 71 1 


£(-*-)" 

w> \3 + y) 


Pain, y) 


Putting y equal to zero and using the formula (13) we see that 



464 


C. TRUESDELL 


(31) 2 P 0 ( n > z) = e* (l — ^ cos /3ir. 

Comparing this result with the formula (25) we see that 

(32) a) = (~) m p A (n, z). 

It would be possible to proceed in this same fashion and discover many other 
formal properties of the Poisson-Charlier functions, but it is perhaps easier to 
notice from the formula (13) that 

(33) Z*(a, z) = cos (/3 - a)nT(a + 1 )z‘ a L^ a \z). 

L](x) being Laguerre’s function suitably generalized for complex lower index 
[4, p. 53]. By means of this formula every relationship involving Laguerre 
functions may be translated into one involving Poisson-Charlier functions. 

REFERENCES 

[1] G. Doetscii, “Die in der Statistik seltener Ereignisse auftretenden Charliersche 

Polynome, und eine damit zusammenhangende Differentialdifferenzengleichung”, 
Math. Ann., Vol. 109 (1934), pp. 257-266. 

q 

[2] C. Truesdei.l, “On the functional equation — F(z, a) * F(z, a -f 1)”, Proc. Nat. 

oz 

Acad. Sci. (J. S. A., April, 1947. 

[3] P. Appell, “Sur une classe de polynomes”, Ann. Ecole Norm., Vol. 9 (1880), pp. 

119-144. 

[4] E. Pinney, “Laguerre functions in the mathematical foundations of the electro¬ 

magnetic theory of the paraboloidal reflector”, Jour. Math. Phys. M. I. T., 
Vol. 25 (1946), pp. 49-79. 



ABSTRACTS OF PAPERS 


Presented June 17-19, 1947, at the San Diego meeting of the Institute 

1. Random Variable^ with Comparable Peakedness. Z. W. Birnbaum, 
University of Washington. 

Let U and V be random variables with symmetrical distributions, i.e. with P(U ^ — T) « 
P(U £ T) and P{V & -T) - P(V £ T ) for all T £ 0. The random variable U shall be 
called more peaked than V if P(| U | £ T) £ P(\ V | £ T) for all T ^ 0. Let Xi ,Ki and 
X 2 ,Y 2 be two pairs of independent random variables such that X % is more peaked than Yi 
for 1 = 1,2. Then under certain additional conditions X = X\ -f X% is more peaked than 
Y * Fi + r 2 . 

2. On Optimum Tests of Composite Hypotheses with One Constraint. Erich 

L. Lehmann, University of California, Berkeley. 

The problem studied is that of finding all similar and bisimilar test regions of composite 
hypotheses, and of obtaining the most powerful of these regions. Various results are ob¬ 
tained for distributions which admit sufficient statistics with respect to their parameters. 
Applications are made to the hypothesis specifying the value of the circular correlation 
coefficient in a normal population, and certain hypotheses concerning scale and location 
parameters in exponential and rectangular populations. 

3. Estimation of a Distribution Function by Confidence Limits. Frank J. 
Massey, Jr., University of California, Berkeley. 

Let Xi , £0 , • • • , x n be the results of n independent observations, having the same cumula¬ 
tive distribution function F(x). Form the function S n (x ) = k/n where k is the number of 
observations less than or equal to x. A confidence band S H (x) db X/\/n will be used to 
estimate F{x). To determine the confidence coefficient it is necessary to find Pr{max \/n 
1 & n (a;) — F(x) | ^ X/Vnj • It is sufficient to consider x uniformly distributed in the interval 
(0,1). Let X\/ n = s/t where s and t are integers. Then S n (x), to stay in the bandF(x) =k 
X/-\//i> ran only pass through certain lattice points above x = i/tn , i =* 1,2, • • • , tn. The 
probability of 8 n (x) passing through a particular sequence of these points is given by the 
multinomial law, and this can be summed over all permissible sequences. Limiting dis¬ 
tributions have been given by A. Kolmogoroff, and by N. Smirnoff. It is desired to test 
the hypothesis F(x) = F Q (x) against alternatives F(x) = F i(x). Using the criterion: reject 
Fo(x) if 

max Vn | F 0 (x) — £ n (x)| > X 

X 

the probability of first kind of error can be controlled by choice of X. A lower bound to the 
probability of second kind of error against alternatives such that max y/n \ F 0 (x) — Fi(x) \ ^ 
A is given. This lower bound approaches one as n —► ». Thus the test is consistent. 

4. A Note on Sequential Confidence Sets. Charles Stein, Columbia Uni¬ 
versity. 

This paper generalizes a paper of Stein and Wald, appearing in the Annals of Math. Stat. % 
Sept., 1947. 

Let {X,}, (i « 1,2, • • •)» be a sequence of random variables whose distribution depends 
on an unknown parameter 0. Sequential confidence sets are determined by a*rule indicating 

455 



466 


ABSTRACTS OF PAPERS 


when to stop sampling and a rule giving the confidence set as a function of the sample. It 
is desired that, for each sample point, the confidence set should be one of a specified class S , 
that the probability of covering the true parameter should be ^ a, and that the least upper 
bound of the expected number of observations should be minimized. If X» are inde¬ 
pendent with the rectangular distribution on (0, 0) and S consists of all intervals of the 
form (Bo , kB 0 ) with k fixed and 0 O a function of the sample, the optimum sequential pro¬ 
cedure is the classical non-sequential procedure. If the Xi are independently and identi¬ 
cally distributed in accordance with a multivariate normal distribution with known co- 
variance matrix 2 but unknown mean 0, and the confidence sets are to be of the form (0 — 0o)' 
2 -1 (0 — Bo) ** r, r fixed, 0 O a variable p-dimensional vector, a similar result holds, provided 
the desired confidence coefficient a is not excessively small. 

6. Explicit Solution of the Problem of Fitting a Straight Line when Both 
Variables are Subject to Error for the Case of Unequal Weights. Elizabeth 

L. Scott, University of California, Berkeley. 

Let a, 0 and (i = l, 2, • • • , s), be unknown fixed numbers and let m = a -f- For 
each value of i there exist rtu measurements xa of & and m measurements ya of , (j = 
1,2, • • • , m< ; k « 1, 2, • • • , n»). The variables and y,k are normally distributed about 
Zi and r/i with variances o\/iii and <r\/vi respectively, where the weights w» and v% are known 
but a\ and <r\ are unknown. The numbers m, and n x are bounded (usually small) while s 
increases indefinitely. Thus a, 0, <r\ and <r\ appear as structural parameters and the £» as 
incidental parameters. (See paper by J. Neyman and E. L. Scott to appear in Econometrica .) 
Mollified maximum likelihood equations (MMLE) yielding consistent estimates of the 
structural parameters are tedious to solve when the products rriiU »• and n t Vi depend on i. 
The main result of this paper consists in proving that the varying m»tt» and/or riiVi can be 
treated as constants. Let Wi and io 2 be the harmonic means of maii and respectively. 
Now, MMLE’s written with m t Ui = Wi and n<v< =* u> 2 yield consistent estimates of a and 0. 
The asymptotic variances are also found. An application is made to certain problems of 
astronomy. 

6. Unbiased Estimates with Minimum Variance. Charles Stein, Columbia 
University. 

Let X be a random variable distributed in the space R according to one of the p.d.f.’s 
<p(x | 0), where 0 is an unknown parameter, and let g(B) be a real-valued function of 0. 
Let B{B) be the set of all x such that <p(x | 0) >0 but ^ (x I 0o) =* 0, and S the set of all 0 
such that B(B) has probability 0 when 0 is the true parameter value. Let 

f(x | 0) « <p(x | B 0 )/<p(x | 0 O and A(0i ,0 2 ) = E{f(X | 00 ^(X | 0 2 ) | 0ol 

for 0i , 0a in S. Suppose A(0i , 0 2 ) is everywhere finite and there exists a set function X 

of bounded variation over S such that / A(0i , 0 2 ) d\(Bi) -» 0(0 2 ). Then an estimate of 

g(0), unbiased for all B in S and having minimum variance at 0o is given by f(x) ■ 

J ' | 0) d\(B)/<p(x | 0 O ). The minimum variance is / g(6) d\(B) — [0(0o)l a . If the 

3 * Js 

definition of f(z) is modified at a set having probability 0 when 0 — 0 O , the properties on S 
and at 0 O remain unchanged. Under mild restrictions this alteration can be carried out so 
as to make f(x ) an unbiased estimate of 0 for all S . The results are related to the work of 
Fisher, Dugu6, Rao, and Bhattacharyya on the amount of information. 

7. Sufficient Statistics and a System of Partial Differential Equations. (A 
Contribution to the Neyman-Pearson Theory of Testing Hypotheses.) Pre- 



ABSTRACTS OF PAPERS 


457 


iiminary Report. Erich L. Lehmann, University of California, Berkeley, and 
Henry Scheff6, University of California, Los Angeles, 

In the Neyman-Pearson theory of testing hypotheses the problem of the existence and 
determination of similar regions has been treated under two approaches: (1) Assuming the 
existence of a set of sufficient statistics for the nuisance parameters; (2) Assuming that the 
probability density satisfies a certain system of partial differential equations. By solving 
the differential equations it is now shown that they imply the existence of sufficient statistics 
for the nuisance parameters. Knowledge of the form of the solution of the differential 
equations permits simplification of the known theory of optimum tests (type B f Bi , etc.) 
as well as some generalization. 


8. Power Function of the Analysis of Variance and Covariance Test of a 
Normal Bivariate Population. W. M. Chen, University of California, Berkeley. 

The problem of finding the power function of the analysis of variance and covariance test 
of a normal bivariate population, p ** 0 and <r x = or, , by means of principle of likelihood 
was reduced to the determination of the distribution function P(L ) of the following moment 
problem: 


f 


L k d P(L) 


(1 - 

KV) 




Mk,r t 


(k - 1 , 2 , .. 0 , 


where 


r ( !L y- 2 ) r ^ r (n - l + 2* + r) 

and a, the argument of the power function, lies in the interval (0,1) and vanishes only when 
the hypothesis tested is true. The moment problem was found and solved by rather tricky 
methods. The result is 


P(L ) 




, * + 1 


) 


where b =» 

9. A Mathematical Model of the Relation between White and Yolk Weights 
of Birds’ Eggs. G. A. Baker, University of California, Davis. 

The purpose of such a model is to find a rational method of estimating a “best line” in 
soma sense which will represent the relation between white and yolk weights for some or all 
species of birds. From data at hand it appears that birds within species may differ in 
means and variances of weights and that the yolk and white weights are positively corre¬ 
lated. Yoik and white weights within a species are functions of egg number. The standard 
deviations of yolk and white weights for different species are approximately proportional 
to mean values. The “true” means for yolk and white weights for different species do 
not lie on a line because of biological differences between species with the same egg size. 
The standard deviation of species deviations from a straight line depend on the size of the 
egg (may be proportional to a weighted sum of the yolk and white weights). If sampling 




458 


ABSTRACTS OF PAPERS 


variances are sufficiently small they may be neglected and a straight line fitted assuming 
both variables subject to error and non-uniform variance. The practicality of maximum 
likelihood estimates is considered. 

10. Statistical Analysis for a New Procedure in Sensitivity Experiments. A. 

M. Mood, Iowa State College, and W. J. Dixon, University of Oregon. 

In the language of biological assay the sensitivity experiment investigates the proportion 
of subjects that respond to a given concentration, x, of a certain chemical. It is assumed 
that only one test may be made on each subject. The new procedure is characterized by a 
change in x for each successive test, depending on the result of the preceding test, x is 
reduced to the next lower of a fixed set of concentrations for the next test if there is no 
response end is increased to the next higher concentration if there is a response. Observa¬ 
tions are thus concentrated near the mean and few tests are made for values of x where a 
very large or very small proportion of subjects would respond. Assuming x is normally 
distributed, approximate maximum likelihood estimates are obtained for the mean and 
standard deviation of x. These assume a form which is simple to compute. Choice of op¬ 
timum increments of x for various situations is investigated. 

11. The Relation of Inbreeding to Calf Mortality. W. M. Regan, S. W. 
Mead, and P. W. Gregory, University of California, Davis. 

An analysis of calf mortality in the University of California dairy cattle breeding experi¬ 
ment is presented. Calves up to 4 months of age that were born singly are included in the 
study. Only those stillbirths and abortions from cows free from Brucellosis and health 
and reproductive abnormalities were considered. A total of 774 Jersey and 258 Holstein 
calves were included. Calves were classified according to inbreeding coefficients as follows 
Class I, the controls 0 0 to 0.1249; Class II, 0.125 to 0.2448; Class III, 0.245 to 0.3749; and 
Class IV, 0.375 and over. There was no relation between the number of abortions and the 
degree of inbreeding. The stillbirths, too few to be statistically significant, tend to increase 
as the coefficient of inbreeding increased. Following birth, however, mortality was corre¬ 
lated with inbreeding of both males and females but for the males it was greater than for 
the females in Classes III and IV, but the difference is hardly significant. The Jerseys 
tended to be less viable than the Ilolsteins. Some of the increased mortality of the more 
highly inbred animals could be accounted for by the action of two lethal genes; one con¬ 
trolling an anomaly of the liver, the other an anomaly of the heart; there was no plausible 
explanation for most of it. Within sex, inbreeding class, and breed there w r as considerable 
variation in the mortality of the progeny of different sires. Some of these differences were 
statistically significant. 

12. Observations on Designs for Cooperative Field Tests. P. A. Minges, 
University of California, Davis. 

In California conditions vary so greatly between the principal production areas that it 
is necessary to establish experimental plots in each of the areas if reliable information is 
to be obtained regarding cultural practices. Most of these tests must be conducted on 
ranches in cooperation with growers and local agricultural extension agents. The designs 
of these tests should be relatively simple, the arrangement should be adjustable to work 
into the growers’ cultural practices and to permit the obtaining of yield records with a 
minimum of interference to the growers’ operations, yet the design must be adequate to 
yield valid data. The randomized block design has proved the most useful, although paired 
plots, factorials, split-plots and Latin squares have been used successfully under certain 
conditions. The Latin square design is useful when a two-way variation is expected, other- 



ABSTRACTS OF PAPERS 


459 


wise it is not usually very efficient. Where yield data are of prime importance, for ureplica- 
tions have been considered most practical. In tests such as variety trials when factors 
other than yields are important, two replications may be adequate. The size of the plot 
has been varied to fit the crop, conditions of the field, and known soil variability. Plots 
two rows wide and 50 to 135 feet long often have been used, frequently without guards 
between plots. Since it is desirable to include checks (untreated controls) in most tests, 
small plots will reduce the loss to the growers when the treatments prove beneficial. The 
information derived from these tests is of most interest to growers and county agents so 
the data should be presented in tables that are easily read. The variability figure which is 
confusing to most people probably can best be presented as the least significant difference. 

13. Population Genetics. N. H. Horowitz, California Institute of Tech¬ 
nology. 

Population genetics attempts to describe the effects on the genetical structure of Mende- 
lian populations of factors such as mutation, selection, migration, and random fluctuations 
due to sampling errors. These diverse elements are brought under a common viewpoint by 
considering their effects on gene frequencies. Since change in gene frequency is the ele¬ 
mentary process of evolution, the above factors are causal agents of evolution. Mathe¬ 
matical models illustrating the interplay of the various elements have been constructed by 
Wright, Haldane, and Fisher. The nature of Mendelian inheritance is such that gene 
frequencies remain constant in large populations not subject to net mutation, selection, or 
migration pressures. Unbalanced pressures initiate evolutionary changes which continue 
until equilibrium is reached at a new level of gene frequencies. Equilibrium frequencies are 
determined by opposing pressures—e.g., opposing mutation rates, mutation opposed by 
selection, etc. Equilibrium, stable or unstable, is also possible under selection alone. In 
small populations, sampling errors among the gametes produce random fluctuations in gene 
frequencies which, superimposed on the equilibrium values, result in probable distributions 
of frequencies. The latter provide a mechanism for the evolution of characters, especially 
biochemical syntheses, which depend on the simultaneous action of a number of individually 
non-adaptive genes. 


14. The Choice of Inspection Stringency in Acceptance Sampling by Attributes. 

J. L. Hodges, Jr., University of California, Berkeley. 

In acceptance sampling by attributes, the probability p that an item will be defective is 
taken to be a function g(x, y) of the quality x of the population and the stringency y of 
inspection. Let n, the number of items inspected, be fixed, and reject if the number of 
defectives is ^ k. It may then be possible to satisfy a condition on the power function 
with different values of k, by adjusting y properly. This paper is concerned with the choice 
of k and y in such situations. A criterion is given, and it is shown that the criterion is 
approximately satisfied by k « [ng{x 0 , y)] where x Q separates acceptable and non-acceptable 
values of x , and y maximizes 


M*o, V) / /--— 

j V g(x o, y)[l 


g(xo, y)}- 


An asymptotic property of this approximation is shown. The method is applied to two 
examples: (a) testing the mean bacterial density x of a liquid by the dilution method, y 
being the volume of liquid incubated, and (b) testing the variance x of a normally dis¬ 
tributed dimension of known mean m by applying gauges set at m ± - . The approximate 

y 

solution is found to be satisfactory in both cases for m — 20. 



460 


ABSTRACTS OF PAPERS 


15. The Application of Learning Curves to Industrial Planning. Preliminary 
Report. James R. Crawford, Lockheed Aircraft Corporation. 

Learning curves are significant factors of analysis in industries producing quantities of 
less than 20,000 units of a given article. Ship-building and airframe manufacture are the 
two largest industries in this class. Learning curves occur where job costs are kept either 
by individual unit or by lot, and also where achievement is measured against a standard. 
Cost per unit plots against ordinal unit number as a straight line on logarithmic graph- 
paper. Learning curves are used to supplement time-studies, determine the capacity of 
tooling, layout of budgets, and for estimating and bidding. The experience of individual 
workers and management are reflected in these analyses. The slope of the learning curve 
is related to the amount to be learned. Plateaus occur which are related to the hiring of 
new workers and to the relaxing of control measures. Other consistent minor patterns 
occur which are related to specific conditions. Equations have been derived and tables 
computed for five related forms of the learning curve. Graphic methods are satisfactory 
except for bidding. This study covers a simple approach to an important problem of indus¬ 
trial management. The findings in the industrial field may benefit research in the field 
of the psychology of learning. 

16. Relative Effects of Inbreeding and Selection in Poultry. W. O. Wilson, 
University of California, Davis. 

Egg production rate, fertility, hatchability, and chick mortality records from the Iowa 
State College Poultry Department’s inbreeding project were studied. Statistics which 
were calculated from the data included simple and partial regression of traits on inbreeding, 
estimates of heritability by correlation between paternal half-sibs and by daughter-dam 
regressions, and selection differentials. The net genetic gain or loss in merit per generation 
was considered to be the sum of the product of selection differentials and heritability, plus 
the product of regression of trait on inbreeding and increase in amount of inbreeding. The 
amount of inbreeding that can be done in each of the traits was estimated when there was no 
net loss or gain. Of the traits studied, the rank was in the following order: Hatchability, 
chick mortality, fertility, and egg production. 

17. The Rate of Genetic Gain in Egg Production in Progeny-Tested Flocks 
as a Function of the Interval between Generations. Everett R. Dempster and 
I. Michael Lerner, University of California, Berkeley. 

The rate of genetic gain in a character for which selection is practiced depends in addition 
to the intensity of selection on (1) the accuracy of selection, and (2) the average interval 
between generations. These factors are not independent and exercise a pull in opposite 
directions. Through the application of Wright’s technique of path coefficients comparisons 
can be made between the expected rates of genetic gain in populations containing varying 
proportions of breeding animals of different ages. The methods used involve the estimation 
of correlations between genotypes, and various selection indexes based on individual, sib 
and progeny records in inculled populations as well as in populations whose range has been 
restricted by previous selection. From these estimates the relative efficiencies of different 
age distribution schemes of a breeding population can be determined. A specific solution 
for such a situation in a flock bred for egg production will be presented as an illustration of 
the problems and methods used in the study of the genetics of populations under artificial 
selection. 

18. Statistical Criteria of the Effectiveness of Selective Procedures. Prelim¬ 
inary Report. R. F. Jarrett, University of California, Berkeley. 



ABSTRACTS OF PAPERS 


461 


The “validity coefficient,” the standard error of estimate, the index of predictive effi¬ 
ciency, the “selection ratio” of Taylor and Russell, Johnson’s Gamma, and other statistical 
devices have been suggested as indices of the effectiveness of selective programs. These 
devices all suffer from the deficiency that they do not permit a satisfactorily precise estimate 
of the dollar value of the increased output expected from the selection program and thus 
leave unsettled the question as to whether or not the cost of such a program is justified. 
The relationship between the correlation coefficient on the one hand and the mean value of 
F for an unselected population (F being an objective output-type criterion), the standard 
deviation of Y for an unselected population, and the mean value of F for the upper Np in¬ 
dividuals selected on the basis of their high performance on the selective test X on the other 
hand, provides the basis for estimating the increase in the mean output of a group of workers 
selected on the basis of a testing program yielding any specified validity coefficient with the 
criterion F. Increase in productivity of selected workers is shown to be a function of the 
validity coefficient, the rigorousness of selection, and the coefficient of variability of the 
output criterion among “unselected” employees. 

19. Approaches to Univocal Factor Scores. Preliminary Report. J. P. 
Guilford, University of Southern California. 

In spite of the fact that univocal factor scores are badly needed for various reasons, it 
appears to be impossible by present methods to construct pure tests for some common 
factors. Recourse must therefore be made to statistical control of component variances. 
It is desirable to derive each factor score from a minimum number of tests. The availability 
of a few univocal tests makes this requirement fairly easy to satisfy. Such tests serve well 
as suppression variables for their common-factor variances where not wanted in other tests. 
Several principles may be invoked as objectives: (1) to maximize the desired variance in 
the impure test, (2) to reduce the undesired variance to zero, or (3) to minimize the undesired 
variance without intolerable loss of the desired variance. A secondary objective is to 
assure a combining weight of +1.00 for the test measuring the desired factor. Equations 
for achieving the objectives have been derived and the limitations and implications of each 
procedure have been noted. By means of statistical control, the situation seems hopeful 
for the achievement of univocal scores for a fairly large number of unique psychological 
variables. There are implications for experimental psychology as well as for vocational 
testing. 

20. A Note on the Problem of Binary Stars. Elizabeth L. Scott, Uni¬ 
versity of California, Berkeley, 

This paper concerns some of the problems of Trumpler (see next abstract). is the 
radial velocity of the i-th Btar, *—l, 2,* • •, «, at tj selected at random, j = 1, 2,* • - ,n. 
measurement of is N (£,•/, <r $ ). is random with distribution c (k t * — — £*o) 2 ) 

where fr* ^ 0 and &o are unknown. (1) Test of hypothesis that Ar» = O. Case (/) it,- known. 
Whatever the exact test T , its power 0r(k) has derivative 0r(O) — 0. Test maximizing 

n 

/3r(0) is that of Trumpler with criterion S* — (xy — £*•)* > x 2 ** • Case (ii) <n un- 

7-1 

known. Whatever the exact test t, 0j m) (O) « 0, m *■ 1, 2, 3. Test maximizing j8j 4) (0) is 

n r n "1j n 

Trumpler’s test (xi,- — a:,-) 4 > ~ **•)* C. (2) Let ^ ({»,• — £*o) 2 = 2X<«r*. 

7-1 Lf-i J 7-1 

For constant velocity stars X — 0. For others it is a random variable. Since, given X >■ 0, 

S* is distributed as non-central x*> an integral equation connects the distributions of S* 
and X. Its solution yields an estimate of the proportion of constant velocity stars. After 
estimating the distribution of X, the level of significance can be estimated and also the 



462 


ABSTRACTS OF PAPERS 


number n of measurements so that the proportion of constant velocity stars declared vari¬ 
able will be less than p, specified in advance. 

21. Statistical Problems of Spectroscopic Binaries. Robert J. Trumpler, 
University of California, Berkeley. 

Spectroscopic Binaries are stars whose radial velocities, as measured by the Doppler 
shift of spectral lines, show a periodic variation. The first problem is to obtain a statistical 
criterion for deciding whether a star with several radial velocity measures, made at different 
times, has a high probability (larger than a specified limit) of variable velocity and should 
be announced as an object worthy of further study. The second problem is to find the 
percentage of variable velocity stars among a large list of stars with several radial velocity 
measures for each star. From the distribution of standard errors only the percentage of 
cases where the velocity variation exceeds a certain limit can be ascertained. The third 
problem is concerned with those stars for which a binary orbit has been determined. The 
statistical distribution of these binary systems according to mean distance between the two 
stars and the ratio of their masses can be evaluated within certain limits. 



NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Mr. Kenneth J. Arrow has been appointed Research Associate of the Cowles 
Commission. 

Dr. W. D. Baten, formerly of Michigan State College, is now Chief, Opera¬ 
tions Branch, Planning Section, Air Defense Command, Mitchel Field, New 
York. 

Dr. Paul T. Bruyere is now Chief of the Medical Records and Statistics Branch, 
Army Institute of Pathology, Office of the Surgeon General, War Department. 

Dr. A. C. Cohen received his discharge from the Army, with the rank of lieu¬ 
tenant Colonel, at the beginning of the spring quarter, and returned to his former 
position at Michigan State College. He has accepted a position at the Uni¬ 
versity of Georgia beginning with the 1947 summer session there. 

Dr. Hallett H. Germond has resigned from his position as professor of mathe¬ 
matics at the University of Florida. He is now Director of Research for the 
S. W. Marshall firm of Consulting Engineers, in New York City. 

Dr. Meyer A. Girshick, formerly with the Department of Agriculture, is now 
with the Douglas Aircraft Company in Santa Monica, California. 

Dr. Clyde H. Graves has accepted a position as Operations Analyst, Opera¬ 
tions Analysis, Air Defense Command, Mitchell Field, New York. 

Dr. E. J. Gumbel has been appointed to an Associate Professorship at Brook¬ 
lyn College. 

Dr. Trygve Haavelmo has returned to Norway, and is at the University 
Institute of Economics, Oslo. 

Mr. Joseph O. Harrison, Jr., is now employed as a Mathematician in the 
Computing Branch of the Ballistic Research Laboratories, Aberdeen Proving 
Ground. 

Dr. Wassily Hoeffding has accepted a psoition as Research Associate, The 
Institute of Statistics, University of North Carolina, Chapel Hill. 

Mr. Cyrus A. Martin is now an administrative analyst and statistician, as¬ 
sisting Chief of Personnel Control of Signal Corps, in Washington, D. C. 

Mr. Jack I. Northam has accepted an Assistant Professorship in the Depart¬ 
ment of Mathematics, Kansas State College, Manhattan, beginning with the 
1947 summer session. 

Professor Henry Schefte, who has been on leave for the past year, returned to 
his position in the Engineering Department, University of California at Los 
Angeles, in June. 

Mr. Edward M. Schrock has accepted a position as Quality Control Engineer 
with the General Electric Company at their Erie Works, Erie, Pa. 

Mr. Jerome R. Steen, who has been manager of Quality Control Engineering 

463 



464 


NEWS AND NOTICES 


with the Sylvania Electric Products in Emporium, Pa., has now transferred with 
the same company to Flushing, New York. 


Professor Emeritus Irving Fisher, of Yale University, died April 29, 1947, 
at the age of eighty. 


In connection with the Atlantic City meeting of the American Chemical 
Society, April 14-18, 1947, a symposium on Statistical Methods m Experimental 
and Industrial Chemistry was held, in which several members of the Institute 
of Mathematical Statistics took part. The following program was presented 
Tuesday morning and afternoon, April 15: 

(1) Introductory Remarks. B. L. Clarke. 

(2) The Management Viewpoint George Smith. 

(3) A New Technique for Testing the Accuracy of Analytical Data . W. J. 

Youden. 

Discussion: Grant Wernimont, R. F. Moran, John Mandel, and Roland 
H. Noel 

(4) Design of Experiments m Industrial Research. Hugh M. Smallwood. 

(5) Statistical Training for Industry. Samuel S. Wilks. 

Discussion: John Tukey, E. V. Lewis, Churchill Eisenhart, and C. West 
Churchman. 


Preliminary Actuarial Examinations 
Prize Awards 

The winners of the prize awards offered by the Actuanal Society of America 
and the American Institute of Actuaries to the nine undergraduates ranking 
highest in the combined score on Part 1 and Part 2 of the 1947 Preliminary 
Actuarial Examinations are as follows: 


First Prize of 9200 
James H. Chung 
Additional Prizes of 9100 

James F. A Biggs 
George Y. Cherlm 
Frank H David 
Thomas M Galt 
Charles F Pinzka 
Philip C. Rapp 
Morton K Schwartz 
James G C. Templeton 

The two actuarial organizations 
awards for the 1948 Examinations. 


University of Toronto 

Yale University 
Rutgers University 
Harvard University 
University of Manitoba 
Rutgers University 
University of Buffalo 
Brown University 
University of Toronto 

authorized a similar set of nine prize 



NEWS AND NOTICES 


465 


The Preliminary Actuarial Examinations consist of the following three examina¬ 
tions : 

Part 1. Language Aptitude Examination 

(Reading comprehension, meaning of words and word relationships, antonyms, 
and verbal reasoning). 

Part 2. General Mathematics Examination 

(Algebra, trigonometry, coordinate geometry, differential and integral 
calculus). 

Part 3. Special Mathematics Examination 

(Finite differences, probability and statistics). 

The 1948 Examinations will be administered by the College Entrance Examina¬ 
tion Board at centers throughout the United States and Canada on May 14-15, 
1948. 


Correction 

In the Directory of Members published in Vol. XVII, No. 4 (December 1946) 
Professor Joseph Kamp6 de Feriet’s name is listed in the F’s under Feriet. It 
should have appeared in the K’s, under Kamp6 de Feriet. 


New Members 

The following persons have been elected to membership in the Institute (March 1 to May 30, 

1947): 

Adams, Joe K. Ph. M. (Wisconsin) Graduate student and half-time instructor in Psy¬ 
chology, Graduate College, Princeton University, Princeton, N. J. 

Adams, Walter B. Communications Analyst, Civil Aeronuatics Admin., Dept, of Com¬ 
merce, 8253 S. Ingleside Ave., Chicago 19, III. 

Aitken, Alexander C. D.Sc. (Edinburgh) Professor of Mathematics, University of Edin¬ 
burgh, 23 Stirling Road, Edinburgh 5, Scotland 

Brambilla, Francesco Ph.D. (Univ. L. Bocconi) Lecturer in Math. Statistics, Institute 
of Statistics, University L. Bocconi, 6 via Panzacchi, Milano, Italy 

Brown, George Middleton, D.Sc. (Michigan) Asst. Prof, of Math., Mich. State College, East 
Lansing, Mich., 633 Cherry Lane 

Bueno, Luiz de Freitas, E.E. (Mackenzie Coll.) Professor da Universidade de Sao Paulo, 
Brazil, Rua ItanM 341, Casa 13 

Burke, Cletus J., M.A. (U.C.L.A.) Res. Ass't, Univ. of Iowa, Iowa City, Iowa, 118 River¬ 
side Park 

Cameron, Joseph M., M.S. (N. Car. State) Room 302 South Building, National Bureau of 
Standards, Washington, D. C. 

Carpenter, Osmer M.S. (Iowa State) Instructor, Mathematics Department, Iowa State 
College, Ames, Iowa 

Castellani, Maria D.Sc. (Rome) Visiting Professor, Department of Mathematics, Uni¬ 
versity of Kansas City, Kansas City 4, Mo 

Chemoff, Herman Sc.M. (Brown) National Research Council Pre-Doctoral Fellow, 3003 
Wallace Ave., Bronx 67, N. Y. 

Clark, Stanley M.Ed. (Saskatchewan) Student and teaching assistant, 1301-7th St., S.E., 
Minneapolis 14, Minn. 



466 


NEWS AND NOTICES 


Cover, John H. Ph.D. (Columbia) Director, Bureau of Business and Economic Res., 
Univ. of Maryland, College Park, Md. 

Dailey, John T. M.S. (N. Texas Teachers Coll.) Res. Psychologist (Aviation), Psycho¬ 
logical Res. and Examining Unit, Sqn. E, Indoctrination Div., Air Training Command, 
San Antonio, Texas 

Darling, Donald A. Ph.D. (Calif. Inst. Tech.) Teaching Ass’t, Calif. Inst, of Technology, 
Pasadena 4, Calif. (As of July 1947, Dept, of Math., Cornell Univ., Ithaca, N. Y.) 

Darmois, Georges D.Sc. (Paris) Prof, it la Faculty des Sciences de Paris, 7 Rue de VOd'eon , 
Paris 6, France 

Davies, J. Alfred M.A. (Alabama) Statistician, Design Eng. Section, General Electric 
Co., 708 Hill Avenue , Ownesboro, Kentucky 

Dunnett, Charles W. M.A. (Toronto) Student, 1044 John Jay Hall, Columbia Univ., 
New York 27, N. Y. 

Egermayer, Frantisek Sc.D. (Charles Univ., Prague) Chief of Section, State Statistical 
Office, 2 BMsk&ho, Prague VII, Czechoslovakia. 

Fickenscher, Edgar H. A.B. (Calif.) Graduate student and teaching ass’t, Univ. of Calif., 
1490 Acton St., Berkeley 2, Calif. 

Fraga, Constantino G. Jr. (Sao Paulo) Head, Dept, of Statistics, Institute Agronomico, 
Campinas (S.P.), Brazil. 

Frank, Elmore J. B.A, (Chicago) Instr. in Statistics, Ill. Institute of Tech., and Statisti¬ 
cian, Commercial Res. Dept., Armour and Co., 6428 Maryland Ave., Chicago 15, III. 

Frisch, Ragnar Ph.D. (Oslo) Professor, University Institute of Economics, Oslo, Norway. 

Geary, Robert C. D.Sc. Superintending Officer, Statistics Branch, Dept, of Industry and 
and Commerce, 27 Leeson Park, Dublin, Ireland. 

Goodman, John R. M.S. (Iowa State) Head, Sampling Section, Survey Res. Center, 
Univ. of Mich., Ann Arbor, Mich. 

Gutman, Pierre M.A. (Columbia) Student, 7 Mountain Ave., Maplewood, N. J. 

Hartline, H. K. M.D. (Johns Hopkins) Assoc. Prof, of Biophysics, Johnson Res. Founda¬ 
tion, Univ. of Pennsylvania, 36th and Spruce Sts., Philadelphia, Pa. 

Hartog, Jacob A. (Rotterdam) Rockefeller Fellow, 25 Follen St., Cambridge , Mass. 

Jacobs, Marcus A.B. (Penn.) Health Statisticisn, 4429 S. 86lh St., Arlington , Va. 

Jeeves, Terry A. A.B. (Calif.) Teaching ass’t in math., Univ. of Calif., 2511 Hearst Ave., 
Berkeley 9, Calif. 

Kempthorne, Oscar M.A. (Cambridge, England) Res. Assoc. Prof., Statistical Lab., Iowa 
State College, Ames, Iowa 

Kendall, David G. M.A. (Oxford) Fellow, Magdalen Coll., Oxford, England 

Kent, Leonard M.B.A. (Chicago) Instr. in Statistics, School of Business, Univ. of Chic¬ 
ago, Chicago 37, Ill. 

Kupperman, Morton B.S. (C.C.N.Y.) Statistician, Office of the Surgeon General, War 
Dept., 2829-27th St., N.W ., Washington 8, D. C. 

Lhati, Elizabeth L. M.A. (Michigan) Statistician, Bur. of Measurement and Guidance, 
Carnegie Institute of Technology, Pittsburgh 13, Pa. 

Levine, Harry D. B.S. (Chicago) Instr., Long Island Univ., 164 W. 96 St., New York 25, 
N. Y. 

Lichtenstein, Morris B.A. (Michigan) Statistician, 4811 N. Capitol St., N.E., Washington 
11, D. C. 

McMillan, Brockway Ph.D. (Mass. Inst. Tech.) Member, Technical Staff, Bell Tele¬ 
phone Labs., Murray Hill, N. J. 

Marshall, AndrewW. Student, 5757 University Ave., Chicago 87, III. 

Metzner, Charles A. Ph.D. (Wisconsin) Study Director, Survey Research Center, Univ. 
of Michigan, Ann Arbor, Mich. 

Norton, John W. B.S. (California) Lab. Supervisor, Union Oil Co. of Calif., 5529 Mac¬ 
donald Ave., Richmond, Calif. 



NEWS AND NOTICES 


467 


Otter, Richard Ph.D. (Indiana) Instructor, Fine Hall, Princeton Univ., Princeton, N. J. 

Passos, Helena Rocha Penteado Dirctor de Divisao do Depto. Estadual de Estatistiea de 
Sao Paulo, Avenida Angelica, 160, Apto . 6, Sdo Paulo, Brazil 

Priest, Edward I. B.S. (Columbia) Student in mathematics, 1204 E. 55th St., Chicago, III. 

Quensel, Carl-Erik Fil.Dr. (Lund) Prof, at the University, Lund, Sweden, Linnegatan 
14 

Rankin, Mozelle M.A. (Ohio State) Ass’t Instructor, Ohio State Univ., 107-14lh Ave., 
Columbus 1, Ohio 

Robb, Richard A. D.Sc. (Glasgow) Mathematics Lecturer and Mitchell lecturer in 
Statistics, Univ. of Glasglow, Glasgow, W. 2, Scotland. 

Ruist, Erik Fil.kand. (Stockholm) Amanuens, Industriens utredningsinstitut, Stock¬ 
holm 16, Sweden 

Shani, Inder M. M.A. (Punjab) Rothamsted Experimental Station, Harpenden, Herts, 
England. 

Schneider, B. Aubrey Sc.D. (Johns Hopkins) Ass’t Director, Dept, of Statistics and 
Special Services, American Cancer Society, 47 Beaver St., New York 4, N. Y. 

Seitz, Jiri Ph.D. (Prague) Koufimskd 8, OSIi, Praha XII, Czechoslovankia. 

Simaika, Jacques B. Ph.D. (London) Lecturer, Faculty of Science, Fuad I University, 
Abbassia, Cairo, Egypt. 

Slatin, Benjamin M.A. (Columbia) Jr. Analyst, Econometric Institute, 179 Peshine 
Avc ., Newark 8, N.J. 

Suydam, Bergen R. A.B. (N.Y. State Coll, for Teachers) Graduate student, Columbia 
University, 1 W. 706 St., Shanks Village, Orangeburg, N. Y. 

Tashmuhamed, Sarymsakov Ph.D. (Moscow) President of the Academy of Sciences of 
Uzb.SSR, Professor of the University, Tashkend, ul. Abdulli Tukaeva /, Tashkent, 
USSR 

Travers, Robert M. W. Ph.D. (Columbia) Examiner, and Assoc. Prof, of Education, 
Bureau of Psychological Services, Univ. of Mich., Ann Arbor, Mich. 

Weiner, Sidney B.S. (C.C.N.Y.) Student, New York University Graduate School, 1689 
East 17th St., Brooklyn 80, N. Y. 

Wezelman, Sol M. A.B. (Michigan) Graduate student, University of Michigan, Ann 
Arbor, Mich., 2432 Burt St., Omaha, Nebr. 

Wishart, John D.Sc. (London) Reader in Statistics, School of Agriculture, Cambridge, 
England 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 

The Twenty-Sixth Meeting of the Institute of Mathematical Statistics was 
held in New York City on Thursday, April 24, and Friday, April 25, 1947, and 
was co-sponsored by the American Mathematical Society. This meeting was 
devoted to a program on Stochastic Processes and Noise. The attendance of 
190 persons included the following 75 members of the Institute: 

F. A. Acton, C. B. Allendoerfer, F. L. Alt, T. W. Anderson, Jr., L. A. Aroian, W. D. Baton* 
Robert Bechhofer, J. H. Bigelow, D. H. Blackwell, Paul Bosehan, G. W. Brown, R. S. Bur- 
ington, B. H. Camp, E. W. Cannon, A. G. Carlton, K. L. Chung, P. C. Clifford, D. D. Cody, 
Harald Cramer, II. B. Curry, J. H. Curtiss, R. L. Dietzold, J. L. Doob, Jacques Dutka, 
Churchill Eisenhart, Benjamin Epstein, Will Feller, M. M. Flood, Bernard Friedman, C. P. 
Gerschenson, H. II. Goode, C. H. Graves, E. J. Gumbel, T. E. Harris, Millard Hastay, L. H. 
Herbach, P. G. Hoel, Mark Kac, R. D. Keeney, T. C. Koopmans, William Kruskal, Jack 
Laderman, J. E. Lieberman, S. B. Littaucr, Melitta Lowy, P. J. McCarthy, Brockway Mc¬ 
Millan, Frederick Mosteller, L. F. Nanni, P. M. Neurath, G. E. Noether, M. L. Norden, C. 
O. Oakley, P. S. Olmstcad, G. B. Price, J. S. Rhodes, John Riordan, Selby Robinson, Frank 
Saidel, Arthur Sard, F. E. Satterthwaite, G. R. Seth, C. E. Shannon, Jack Sherman, W. A. 
Shewhart, Rosedith Sitgreaves, Andrew Sobczyk, Milton Sobel, Emma Spaney, C. M. 
Stein, J. W. Tukey, D. F. Votaw, Jr., B. T. Weber, S. S. Wilks, Jacob Wolfowitz. 

The first session, was held on Thursday morning, with Professor Carl Al¬ 
lendoerfer of ITavcrford College serving as chairman. The following program 
was presented: 

Stochastic Processes — 

Description, Professor J. L. Doob, Columbia University 

Estimation, Professor Will Feller, Cornell University 

Prediction, Professor N. Wiener, Massachusetts Institute of Technology 

This meeting was concluded with a discussion by Dr. H. W. Bode, Bell Telephone 
Laboratories, Professor Mark Kac, Cornell University, and Professor A. Wald, 
Columbia University. 

Dr. S. O. Rice, Bell Telephone Laboratories, was chairman of the Thursday 
afternoon session. The following program was presented: 

Stochastic Processes in Some Applications — 

In Economics , Dr. T. Koopmans, Cowles Commission 
In Insurance, Professor H. Cramdr, Yale University 
In Cosmic Radiation, Professor N. Arlcy, Princeton University 
In Nuclear Physics, Dr. S. M. Ulam, Los Alamos Laboratory 

The final session was held on Friday morning with Professor J. W. Tukey 
of Princeton University as chairman. The program was as follows: 

Different Ways of Describing Noise — 

By a Noise Spectrum, Dr. C. E. Shannon, Bell Telephono Laboratories 
By a Single Function , Mr. J. E. Bigelow, Institute for Advanced Study 
By Many Functions, Professor Mark Kac, Cornell University 
Round Table on Interrelations, Messrs. Shannon, Bieglow, Kac, and Rice 

P. S. DWYER, 
Secretary. 


468 



REPORT ON THE APRIL MEETING OF THE INSTITUTE IN 
ATLANTIC CITY 

The Twenty-Seventh Meeting of the Institute of Mathematical Statistics was 
held in cooperation with the Eastern Psychological Association, on Saturday 
morning, April 26, 1947, in Atlantic City. This meeting was a Round Table 
on Certain Recent Statistical Developments , and its attendance of approximately 
100 persons included the following 9 members of the Institute: 

F. S. Acton, J. W. Dunlap, Benjamin Epstein, Irving Lorge, P. J. McCarthy, Frederick 
Mosteller, P. J. Rulon, F. E. Satterthwaite, and Emma Spaney. 

Professor Bernard Riess of Hunter College was chairman of the meeting. 
The following program was presented: 

Papers: Sequential Analysis. 

Dr. Irving Lorge, Teachers College, Columbia University 
Staircase Methods. 

Dr. Philip J. McCarthy, Cornell University 
Inefficient Statistics. 

Dr. Frederick Mosteller, Harvard University 

Discussion: Dr. Jack W. Dunlap, Psychological Corporation 

Dr. Leon Festinger, Massachusetts Institute of Technology 

Dr. William E. Kappauf, Princeton University 

Dr. Joseph Zubin, New York Psychiatric Institute Hospital 

P. S. DWYER, 
Secretary. 


469 



REPORT ON THE SAN DIEGO MEETING OF THE INSTITUTE 

The first Western Regional meeting of the Institute of Mathematical Statistics 
was held in San Diego, California, June 17-19, 1947, jointly with the American 
Association for the Advancement of Science. The meeting was attended by 53 
persons, including the following 31 members of the Institute: 

G. A. Baker, Joseph Berkson, Z. W. Birnbaum, H. C. Carver, Harald Cram6r, J. R. 
Crawford, Dorothy Cruden, W. J. Dixon, Robert Dorfman, M. W. Eudey, Evelyn Fix, 
John Gurland, J. L. Hodges, Jr., J. M. Howell, H. M. Hughes, E. S. Keeping, E. L. Lehmann, 
R. H. Lien, F. J. Massey, G. F. McEwen, Frederick Mosteller, S. W. Nash, Jerzy Neyman, 
Kathryn B. Rolfe, Henry Scheffd, Herbert Solomon, C. M. Stein, Zenon Szatrowski, H. M. 
Walker, J. D. Williams, Zivia S. Wurtele. 

The afternoon session on June 17 was a joint meeting with the Group of 
Former Operations Analysts. The following program was presented under the 
chairmanship of Col. Roscoe C. Wilson: 

Topic: Statistical Problems in Operations Analysis. 

Papers: Engineering and Statistics at the Pacific Front in World War II. 

Roger Wilkinson, Bell Telephone Laboratories, New York City. 

Present Organization and Activities of Operations Analysis. 

Leroy A. Brothers, Operations Analysis, Asst. Chief of Air Staff-3, Washington, 

D. C. 

Statistical Evidence of Bomb Release Malfunctions. 

Mark W. Eudey, University of California, Berkeley. 

Study of Effectiveness of Certain Bombs Used Against German Industrial Targets. 

J. Neyman, University of California, Berkeley. 

The morning session on June 18 w r as presented with Professor Alva R. Davis 
as chairman, and the program was as follows: 

Topic: Statistical Problems in Biology. 

Papers: A Mathematical Model of the Relation between White and Yolk Weights of Birds' 

Eggs. 

G. A. Baker, University of California, Davis. 

Statistical Analysis for a New Procedure in Sensitivity Experiments. 

W. J. Dixon, University of Oregon, and A. M. Mood, Iowa State College. 

The Relation of Inbreeding to Calf Mortality. 

P. W. Gregory, University of California, Davis. 

Cooperative Field Trials. 

P. A. Minges, University of California, Davis. 

Population Genetics. 

N. H. Horowitz, California Institute of Technology. 

Statistical Problems in Assessing Methods of Medical Diagnosis } with Particular 

Reference to X-Ray-Technique. 

J. Yerushalmy, United States Public Health Service, Washington, D. C. 

Discussion: J. Neyman, University of California, Berkeley. 

Professor John W. Miles was chairman of the afternoon session on June 18, 
w'hich w r as a joint session with the California Section of the American Society 
for Quality Control. The following papers were presented: 

470 



SAN DIEGO MEETING 


471 


Topic: Industrial Applications of Statistics. 

Papers: Operating Characteristics of Average and Range Charts. 

Henry Scheffc, University of California, Los Angeles. 

Sampling Inspection by Variables. 

Herbert Solomon, Stanford University. 

Some Exact Numerical Results for Sequential Acceptance Sampling by Attributes. 
Mark W. Eudey, University of California, Berkeley. 

Choice of Inspection Stringence in Acceptance Sampling by Attributes. 

Joseph L. Hodges, University of California, Berkeley. 

Widening Tolerances for Closer Fitting Parts. 

Edmond E. Bates, Quality Engineering Consultants, Los Angeles. 
Discussion: Russell O’Neill, University of California, Los Angeles. 
Re-establishing Operator Responsibility for Quality Control. 

Wyatt H. Lewis, General Electric Company, Ontario, California. 

Discussion: William B. Rice, Plomb Tool Company, Los Angeles. 

The Application of Learning Curves to Industrial Planning. 

James R. Crawford, Wright Field, Dayton, Ohio. 

The Wednesday evening session was under the chairmanship of Professor 
George Beadle, California Institute of Technology, with the following program: 

Topic: Statistical Problems in Genetical Studies in Chickens. 

Paper: Rate of Genetic Gain in Egg Production in Progeny-tested Flocks as a Function of 
the Interval between Generations. 

Everett R. Dempster and I. Michael Lerner, University of California, Berkeley. 

On Thursday morning, June 19, there was a joint session with the Western 
Psychological Association. Professor Helen Walker of Columbia University was 
chairman. The program was as follows: 

Topic: Statistical Problems in Psychology. 

Papers: Statistical Criteria of the Effectiveness of Selective Procedures. 

R. F. Jarrett, University of California, Berkeley. 

Unsolved Statistical Problems Arising in Psychological Measurements. 

Helen Walker, Columbia University. 

Cost Utility Curves as a Means of Assessing Batteries of Tests. 

Joseph Berkson, Mayo Clinic. 

Approaches to Univocal Factor Scores. 

J. P. Guilford, University of Southern California. 

The afternoon session on June 19 was under the chairmanship of Professor 
Harald Cramer of Stockholm, Sweden, and offered the following program: 

Topic: Theory of Statistics and its Applications to Astronomy. 

Papers: Random Variables with Comparable Peakedness. 

Z. W. Birnbaum, University of Washington. 

Distributions which Lead to Regressions Representable by Polynomials. 

Evelyn Fix, University of California, Berkeley. 

Optimum Tests of Composite Hypotheses with One Constraint. 

Erich L. Lehmann, University of California, Berkeley. 

Estimation of a Distribution Function by Confidence Limits. 

Frank J. Massey, Jr., University of California, Berkeley. 

A Note on Sequential Confidence Sets. 

Charles Stein, Columbia University. 



472 


SAN DIEGO MEETING 


Certain Types of Statistical Problems in Astronomy. 

Robert J. Trumpler, University of California, Berkeley. 

Basic Concepts of the Theory of Statistics in Relation to Certain Problems of 
Astronomy. 

J. Neyman, University of California, Berkeley. 

A Note on the Problem of Binary Stars. 

Elizabeth L. Scott, University of California, Berkeley. 

Explicit Solution of the Problem of Fitting a Straight Line when both Variables 
are Subject to Error for the Case of Unequal Weights. (By title) 

Elizabeth L. Scott, University of California, Berkeley. 

Power Function of the Analysis of Variance and Covariance Test of a Normal 
Bivariate Population. (By title) 

Way Ming Chen, University of California, Berkeley. 

Unbiased Estimates with Minimum Variance. (By title) 

Charles Stein, Columbia University. 

Sufficient Statistics and a System of Partial Differential Equations. (By title) 
Erich L. Lehmann, University of California, Berkeley, and Henry Scheflte, 
University of California, Los Angeles. 


On Wednesday evening, June 18, at 6 o’clock, there was a dinner for members 
and guests, at the Hotel San Diego. 


P. S. DWYER 
Secretary 



ON OPTIMUM TESTS OF COMPOSITE HYPOTHESES WITH ONE 

CONSTRAINT 1 

By E. L. Lehmann 
University of California , Berkeley 

Summary. This paper is concerned with optimum tests of certain composite 
hypotheses. In section 2 various aspects of a theorem of Scheff6 concerning type 
Bi tests are discussed. It is pointed out that the theorem can be extended to 
cover uniformly most powerful tests against a one-sided set of alternatives. 
It is also shown that the method for determining explicitly the optimum test 
region may in certain cases be reduced to a simple formal procedure. These 
results are used in section 3 to obtain optimum tests for the composite hypothesis 
specifying the value of the circular serial correlation coefficient in a normal 
distribution. A surprising feature of this example is the fact that for the simple 
hypothesis obtained by specifying values for the nuisance parameters no test 
with the corresponding optimum properties exists. 

In section 4 the totality of similar regions is obtained for a large class of prob¬ 
ability laws which admit a sufficient statistic. Some composite hypotheses 
concerning exponential and rectangular distributions are treated in section 5. 
It is proved that the likelihood ratio tests of these hypotheses have various op¬ 
timum properties. 

1. Introduction. In developing tests for a class of hypotheses three phases 
may be distinguished. First, tests are obtained which are intuitively appealing; 
next, it is shown that these tests have certain attractive features; finally, it is 
proved that they are “best possible” tests. 

In dealing with parametric hypotheses, the likelihood ratio principle is fre¬ 
quently used to obtain a reasonable test. For many of the tests so derived for 
normal and exponential distributions, the question of bias has been investigated. 
In most cases unbiasedness has been established; in the other cases, usually a test 
based on the same criterion but with the boundaries shifted, can be proved to be 
unbiased. Other desirable properties which likelihood ratio tests have been 
shown to possess, relate to the asymptotic behaviour of these tests as the sample 
sizes tend to infinity. An interesting problem which does not seem to have been 
treated is the question of admissibility of likelihood ratio tests, a test being ad¬ 
missible if its power can not be improved upon uniformly by any other test of 
the same level of significance. 

Investigations of optimum tests of composite hypotheses have been carried 
through for many hypotheses concerning normal distributions. When the hy¬ 
pothesis specifies the value of one parameter (hypothesis with one constraint), 
uniformly most powerful one-sided and type B x (uniformly most powerful un- 

1 Presented at a meeting of the Institute of Mathematical Statistics in San Diego, June, 
1947. 


473 



474 


E. L. LEHMANN 


biased) tests have been obtained. When the number of constraints is larger 
than one, not so much can be expected. It has been shown for some of the tests 
in this class that they have maximum average power uniformly over a family of 
surfaces in the parameter space, or that they are uniformly most powerful with 
respect to the subclass of tests whose power depends only on some function of 
the parameters. (All optimum properties mentioned are relative only to the 
class of all similar regions. This will be so throughout the paper and will usually 
not be stated explicitly). 

Two methods for finding uniformly most powerful or uniformly most powerful 
one-sided regions and type Bi tests, if they exist are known. Neyman and Pear¬ 
son [1] developed a method for determining all similar regions, and applied it 
to obtain uniformly most powerful one-sided tests of certain hypotheses. Ney- 
ipan [2,3] extended the method to obtain, for certain hypotheses, the class of all 
bisimilar (unbiased similar) regions, and Scheffe [4], developing the method 
further, proved the existence of type B x tests for an important class of hypotheses. 

A different method for obtaining all similar and bisimilar regions was devised 
by P. L. Hsu and was used by him and other writers to prove various optimum 
properties of the likelihood ratio tests for the general linear hypothesis, of Hotel¬ 
ling's T 2 and of other tests [5, 6, 7, 8]. 

In the present paper we are concerned with applications of these two methods 
to composite hypotheses with one constraint. However, the applicability is not 
so restricted. In fact, the second method has been used mainly in connection 
with composite hypotheses with many constraints, and the author believes it to 
be suitable also for deriving optimum classification procedures. An essential 
restriction of both methods seems to be that a set of sufficient statistics must exist 
with respect to the parameters involved: with respect to the nuisance parameters 
so that all similar regions can be found, with respect to the parameters specified 
by the hypothesis so that there exists a best of all similar regions. 

Extensions of the existing theory based on the first method are obtained in 
section 2, and the theory is applied in section 3 to a hypothesis concerning a mul¬ 
tivariate normal distribution. Sections 4 and 5 are concerned with applications 
of the second method to problems to most of which the earlier method is not 
applicable, in particular to hypotheses concerning exponential and rectangular 
distributions, hitherto only treated from the likelihood ratio point of view. 

2. On the theory of optimum tests. 

2.1 One-sided tests. In an interesting paper [4], Schefite determined the type 
B and type Bi tests of a certain class of composite hypotheses specifying the 
value do of a parameter 0 in the presence of nuisance parameters. 

Schefife’s results can, in an obvious way, be extended to cover one-sided sets 
of alternatives. To show this, consider the method used in [4], Under certain 

assumptions all tests 2 are found which satisfy the two conditions: 

—--- • 

1 The terms “the test w” and “the region [of rejection] w ” will be used interchangeably. 



OPTIMUM TESTS 


475 


(a) The power function at 0o has a preassigned value e (the level of signifi¬ 
cance), independent of the nuisance parameters; 

(b) the power function at 0 O has derivative 0. (Condition of unbiasedness). 
Then that test w 0 is determined for which, of all those satisfying (a) and (b), 

(c) the second derivative at 0 O , Pw(Qo), is as large as possible. 

By definition wq is a type B test. Under a certain additional assumption (this 

d 2 g * 

is the convexity assumption — 2 > 0 of Scheflfe’s Theorem 2) it is shown that of all 

oy i 

tests satisfying (a) and (b), wq has maximum power against all alternatives, 
i.e. is of type Bi . 

If now we want to maximize the power against only the one-sided set of alter¬ 
natives, 6 > 0 O , we determine that test W\ of all those satisfying (a), for which 

(d) the first derivative at 0 O , 0»(0o), is as large as possible. 

Under a certain additional assumption (in Scheff^’s notation this would be the 
dg 

monotonicity assumption — > 0) it can then be shown that of all tests satisfy- 

oy i 

ing (a), Wi has maximum power against all alternatives 0 > 0 O , (it also has 
minimum power against all alternatives 0 < 0 O ), i.e. Wi is uniformly most power¬ 
ful against alternatives 0 > 0 O . We shall not carry through the discussion 
in detail since Scheffe’s argument applies step by step, with only the obvious 
changes. 

2.2 Determination of the boundaries. Let X x , • • • , X n be n random variables 
with a joint probability density function p, depending on parameters 0i and 0 = 
(02, • • • ,0/). We shall denote the probability density function of a set of ran¬ 
dom variables X \, • • • , X n whose distribution depends on a parameter 0 by 
p(x i, • • • , x n | 0) or simply by p(x i, • • • , x n ) when the dependence on 0 is 
clear from the context. The set of points fa , • • • , x n ) for which 

p(.Xi | 0) 

is positive we shall denote by W+(0). 

Let 


(2.1) ^,(*1 , • • •, x„) = -- log p(x 1 , • • •, x n 1 01 , d) , (i = 1, • • •, o. 

and let the random variable be defined by 

(2.2) *< =? *.(*!,•--,X„). 

Then for testing the hypothesis H : 0i = 0?, under the assumptions stated by 
Scheflte, the type Bi test wo is defined by the inequalities 

(2.3) <pi < ki , > fe (&i < k%) 

where k x , fo depend on 0?, 0, ^ , • • • , and are determined by the two equa¬ 
tions 8 

(2.4) f <p\p(<pi> • • • , tpi) d<pi = (1 ~ «) f same (s - 0, 1). 

Jki J—ao 

1 Although ki and kt may depend on 0, u>o is independent of 0, as was shown, in [4]. 



476 


E. L. LEHMANN 


The equations (2.3) and (2.4) are not suitable for the determination of the 
boundary of w 0 . The variables have to be transformed so as to obtain for 
Wo an expression from which the calculation of the boundaries becomes feasible, 
(cf. [9]). This part of the work may be formalized in the following theorem. 
Theorem 1 . Let 


(2.5) 


U = /($>! , $2 , • • • , 4>i) 


Vi = g t (< f> 2 , ••• ,$/), 


G - 2, • • • , 0, 


be a system of functions , continuously differentiable and with non-vanishing Jaco¬ 
bian almost everywhere , and such that 

(i) U is a linear function of 

(2.6) ^ U = a<f>! + b 

with coefficients which may depend on $ 2 , • • • , and such that 4 a($ 2 , • • • ,4*0 > 0> 

(ii) it is possible to solve for 4> 2 , • • • ,$i in terms of the V’s , 

(iii) under the hypothesis II, U is distributed independently of 

7= (7 2 , • • • , Vi ). 


Then the region w 0 is equivalent to the region 
(2.7) u < c 2 , > C 2 (ci < c 2 ) 


where Ci, Ca are determined by 

( 2 . 8 ) 

Proof. 

V(<Pl J <P2 i * * * > ?l) = Pfa, ^2 , * ’ * , V t ) 


u 9 p(u ) du — (l — e) f u*p(u) du (s = 0, 1). 

Cl OO 

5(w,v 2 , •••!>«) I 


(2.9) 


d(<pi, •••,&) 

du 


p(u)-p(v 2 , • •• , Vt) 


d<pi 


a(» 2 , 

••• ,»l) 

d(^2, 



But 

( 2 . 10 ) 


( 2 . 11 ) 


u = a(v?i, • • • , <pi) • + b{<f>2 , • • • , <pt) 

= a(v2, • • • , Vi) • <pi + /S(^2, • • * , t>0 

so that (2.4) reduces to 

p(w)p ^ s ^ 

•••,»!) \ a / 

« (1 — «) / same (s * 0, 1) 

•Loo 


4 A similar theorem holds when we assume a(*», . . ., 4r) < 0. 



OPTIMUM TESTS 


477 


and hence to 

( 2 . 12 ) 


' U*p(u) du = (1 — €) / 


same 


(» = 0 , 1 ) 


which shows Cj and C 2 to be independent of the v’s. Also obviously (2.3) trans¬ 
forms into (2.7) which completes the proof. 

If U is such that its distribution (when 0i = 0$) is independent of 0, c x and C 2 
of theorem 1 will depend only on the data of the problem: e, n, 0?. However, the 
existence of constants c x and C 2 satisfying (2.8) still has to be proved. We may 
show more generally the existence of fa and fa satisfying (2.4). A proof is im¬ 
mediately supplied by an argument which was used by Neyman [10] and Wald 
[11] to prove the existence of type A tests, and which may be stated in the 
following 


Lemma. 


Let 0 < a < 1, let f(x) > 0 and 



dx < oo for s = 0,1. 


Then 


there exist A , B such that 


(2.13) 



(s = 0, 1). 


3. Testing for circular serial correlation in a normal population. We now 

apply the results of the previous section to obtain the optimum tests (i.e. uni¬ 
formly most powerful against the one-sided set of alternatives, type B x in the 
two-sided case) for the hypothesis specifying the value of the circular serial cor¬ 
relation coefficient in the normal population considered by Dixon [12]. (For 
the literature on testing for non-circular serial correlation in normal populations 
cf. [12]). 

We assume 

l _ 5» r i « “i 

(3.1) p(*i, • • • , x„) = ^/2x<r) n eXpi [ “ 2? 5 K** “ ® ~ S( - Xw “ 

where x„ + i = x x and | 8 | < 1, and we test the hypothesis 8 = 5 0 . For testing 
purposes only the value 5 0 = 0 is of interest presumably, however, the family of 
tests for arbitrary 8 0 is required for estimating 8 by means of confidence intervals, 
and therefore the more general hypothesis is considered. 

Making a transformation in one of the parameters we write 

P(*l , ■■■ , Xn) 

v3 ' 2 ' ) = C(s, a) exp£ a£(l + S 1 ) J2 fa ~ f)* - 2S g fa - {)(* m - 

where in the notation of the previous section 0i = 5, 0 2 = a, 0 8 = 

Theorem 2. For testing the hypothesis S = 8 0 for the distribution (3.2) 

(a) the type B x test exists and is given by 

(3.3) 


r < n, > r 2 



478 


E. L. LEHMANN 


where 


(3.4) 


r 


n 

E 


(X% x) (^i-f 1 x) 

E (*i - xf 


X = - Hxi 
n 1 


»-i 

and wtere n and r 2 are determined by 

(3.5) ^ (i ^ 5 « r _ 2i 0 r) dr = ( J “ «) f_" same (« = 0,1). 

(b) die uniformly most powerful similar region for testing H against the alter¬ 
natives 6 > So exists and is given by 

(3.6) r > r' 


where r' is determined by 

(3.7) f p(r) dr = (1 — e) [ p(r) dr. 6 

J— oo oo 


Proof. We compute 

= Ci($o, oc ) + 2a[5oS(aJt — £) 2 — 2(xi — £)(z»+i ~ £)] 

(3.8) <p 2 = C 2 (S o, a) + (1 + 5?)2(z t - — £) 2 — 28 0 2(x, — £)(x,+ 1 — {) 

<pi = — 2na(l — £o)(z — £). 

There is no difficulty in checking the conditions of Scheflte’s theorems [4], 

Next we apply Theorem 1 of the previous section, and define 

V 2 - (1 + 5 2 0 )S(X t - - X) 2 - 25oS(X< - X)(X i+l - X) 

(3.9) Pa - X - £ 

rr _ 2(X, - X)(X»i - X) 

7 a 

Conditions (i) and (ii) of Theorem 1 are easily seen to be satisfied. To show that 
U is independent of V = (7 2 ,7a) we employ arguments which have recently 
been used by various authors in a number of similar problems (cf. [13,14,15]). 

It is seen that an orthonormal transformation exists: 



*1, 

, ... ,X.-+Y lt ■ 


such that 






\/nX 

= Yi 

(3.10) 

£<* 

i-l 

- X)(X m - X) 

= i,\<Y*i 

i-2 



E (Xi - X) 1 

= E Y]. 



i-i 

t—2 

1 A corresponding result holds 

for the other one-sided case. 



OPTIMUM TESTS 


479 


Under H the Y ’s are distributed with probability density 

]] 

where k, & , • • ■ , Mn depend on 5 0 and where the m’s are all positive. Introducing 
new variables 

(3.12) Zi = VJT<Y<, (i = 2, »), 

and, then, generalized polar coordinates in the space of the Z% 


(3.11) p(yi, ■■■,$*) = C($ 0 , a) exp - Vwi)* + 53 mV* 


(3.13) R — ^ t > ^--2 

we see that Yi , R and , • • • , ^ n _ 2 are completely independent. Also 
V^R\ F, = ~ (Fx-0 

while U , being homogeneous of degree 0 in the Z% is a function of the Va only. 
This proves that U, V 2 and F 3 are completely independent. The type Bi test 
of H is therefore given by 


(3.14) u 


52 (*^i ^)(^*+i ®) 

---—-;- <<*,><* 

(1 + i?) 53 ( x ‘ ~ x ) 2 ~ 2 ^o 53 (*• - x ) i x i+i — 5) 

*-l 1-1 


where C\ and c% are determined by 

(3.15) f u'p(u ) du = (1 — e) f u'puQ du (s «■ 0, 1). 

Jci J—ao 

We still have to show that this test is equivalent to the one defined by (3.3) 
and (3.5). For S 0 = 0 this is trivial. Let us assume 5 0 < 0. (The other 
case goes through similarly.) The inequality u < Ci is equivalent to 

(3.16) (1 + 2aoCi)20r, - x)(x i+l -*)<(! + 5?)2(a; < - x) a 


and hence to 


(3.17) 

provided 1 + 2ci5o > 0. 


2(x, - £)(x i+ i - £) 

2(*< - sy 1 

Suppose 1 + 2ci5 0 < 0, i.e. Cj > — ~r. 

*do 


Then* 


(3.18) 


P {U < c,} > p\lJ < -~-J = PfO < AXi - .£)*} = 1 


6 We denote the probability of an event A by P sA 



480 


E. L. LEHMANN 


i.e. P\U < ci} =1 which would contradict (3.15). Similarly if 1 + 2c 2 5 0 < 0 
we would have P\U > <h\ — 0 and hence our test would be one-sided and there¬ 
fore not unbiased. The inequalities u < c \, > C 2 are thus equivalent to the 
inequalities r < n , > r 2 and since 

r 

1 -f- 5o — 25o r 9 

(3.5) also follows. 

The existence of type By. and uniformly most powerful one-sided tests of the 
hypothesis H is rather surprising. For when a and £ are assumed known, neither 
the type A\ test nor the uniformly most powerful one-sided test of the simple 
hypothesis H': 8 = 5o exists. This is easily seen by determining the most 
powerful and the most powerful unbiased test against a specific alternative 8y 
for the hypothesis IP in the population 

1 — 

(3.19) p(zi, •• •, x n ) = exp [—J[(l + 8 2 )2x] - 282xtXm]\. 

The distribution of the criterion II was obtained by R. L. Anderson [16] (see 
also [17]) for the case 5 = 0. Madow [15] using Anderson's result found the dis¬ 
tribution for arbitrary 5. (Approximations to the distribution have been studied 
by various authors; for the literature on this cf. [18]. Recently Hsu [19] ob¬ 
tained an asymptotic expansion.) A direct derivation for arbitrary 5 may be 
based on the following theorem of Cram6r, which was communicated to the 
author by Dr. P. L. Hsu. 

Theorem 3. ( Cramtr ) 7 . If X, Y are two random variables , (not necessarily 
independent ), Y > 0, then 

(3.20) 


where <p x and $ are the characteristic functions of X — xY and Y respectively , 
provided 


( 3 . 21 ) 



gx(o - m 

t 


dt < oc. 


Theorem 4. If 


( 3 . 22 ) 


P(*I , • • • , X n ) 


1 — 8 n 
(\/2ir <r) n 



[(x< — f) — 5(x<+i 



(Xn+1 = Xi) 


7 Differentiated forms of the theorem were given by R. C. Geary [Jour. Roy. Stat. Soc. 
Vol. 107 (1944) p. 56] and H. Cram4r [Exercise 6 on p. 317 of Mathematical Methods of 
Statistics. Princeton Univ. Press (1946)]. 



OPTIMUM TESTS 


481 


and if 

(3.23) 

then 

(3.24) 


Z (Xi - Xxx m - X) 


P|R > r) = 


R = 


e\n+l/2 


E (x,- - x J ) 

»-l 


1 - 8 n 


n (1 - 6)(1 + 5 ? - 25r) 

( —l/ +1 (X,- — r) n_,/ * sin — sin 


•E 


n n 


1 + 5 2 - 28\j 


where the summation is extended over all integer j, 1 < j < -, for which Xy > r, arid 
where 


(3.25) X, = 2 cos ^. 

The proof of this theorem from Theorem 3 is straightforward and only will 
be indicated here. If X and Y denote the numerator and denominator of R 
respectively, the characteristic functions of Y and X — rY may be obtained by 
the method of circulants (cf. [12, 17]). The integral on the right hand side of 
(3.20) is then easily evaluated by-the theory of residues when n is odd. In the 
case that n is even, the integrand has two branchpoints, one in the lower and one 
in the upper half plane. These may be separated, and then again the method 
of residues may be applied. 


4. Similar regions. The problem of finding all regions similar to the sample 
space with respect to a parameter 0 was solved by Neyman and Pearson [1] for 
a certain class of probability laws. In a later paper Neyman proved ([20] 
proposition IX) that if there exists a sufficient statistic T for a parameter 0, 
then w is similar with respect to 0 if it has the following structure: For the inter¬ 
section w{t) of w with the surface T = t, the relative probability of w(t) given 
T = t has a constant value independent of t. We shall show in this section that 
for a large class of probability laws which admit a sufficient statistic for 0 the 
regions with the above structure are the only ones that are similar with respect 
to 0. 

We consider samples from a univariate distribution and we distinguish three 
cases as one, both or neither of the extremes of the range of the distribution 
depend on the parameter 0. For the first of these cases (cf. Pitman [21]) we con¬ 
sider samples from a distribution with probability density 



482 


E. L. LEHMANN 


(4.1) p(x ) = , k(d) < X < C, 

where k(d ) is a strictly monotone continuous function of $ and where c may be 
infinite. Introducing a new parameter 5 = k(6) the distribution of a sample 
from (4.1) is given by 

(4.2) pfa, •••,*.) - f — s<x<<c. 

To obtain the totality of regions w similar with respect to 8 let us denote by 
Wi , • • • , W n the portions of the sample space where the smallest of the x's is 
X \, • • • , x n respectively. For any region w denote by w k the intersection of w 
with Wk . Consider a transformation carrying W 2 , • • • , W n into Wi , letting 
yi = minfe , • • • , x n ) and letting in W k : 


(4.3) y 2 = xi , yi = x 2 , • • • , y* = x k . __i, y k+l = x k+ i , • • • , y n = x n • 

Denote by w k the image of w k under this transformation. The condition that 
w be similar with respect to 8, 


(4.4) 


/. 


f(x,) ■ • ■ f(x„) 

6(a) 


dxi • • • dx n 


(«) 


may be written in the form 

f.m{U 


(4.5) 


J v>k(V l) 


dy)j dyi 


fiV n) <%2 

= nt f jjfj) { f ' f(yi) • • • f(yn) dyi • • • */„) dyi 
Jl 0(5) (J(ir(if t )> J 


where W(yi) denotes the region yi < yi < c f (i = 2, • • • , n), that is, the region 
of variation of y 2 , • • • , y n given yi , and where w k (yi) denotes the region of vari¬ 
ation of y 2 j * * * , 2/n given yi and w k . From (4.5) we obtain 

(4.6) ^ f(yi)f(Vi) dyi ** 0 

where 


ttyi) - Z f f(y 2 ) • * • /(2/n) dy 2 • • • 

-ruf ••• [ f(yt) •• • f(y n )dyf’ dy K . 

. *' 1/1 •'Vi 


(4.7) 

But (4.6) implies 


(4.8) 


^( 1 / 1 ) = 0 almost everywhere 


and since we can only determine w up to a set of measure 0, we may omit the 
qualification in (4.8). Therefore a necessary and sufficient condition for w to 
be similar is 



OPTIMUM TESTS 


483 


(4.9) 


n d 2/J 


n * 

pi 22 / f(Vi) • * • S(Un) dyf • dy n - « 

I *-l J u>k(vO 


for all 2/1 • 

To see more clearly the structure of these regions, let us take n = 2. Equa¬ 
tion (4.9) states that on each of the broken lines of Fig. 1 the relative probability 
of w = wl + w' 2 given Y x = y x is €, where the decomposition of this probability 
into its two components may vary with yi . 





In general equation (4.9) states that on each hyperplane Y x = yi the relative 
probability of w is independent of yi . Since Y x — min (X x , • • • , X n ) is a suffi¬ 
cient statistic for 6 , Neyman’s theorem in this case does give all similar regions. 

Next let us consider the case where both extremes of the range of the distribu¬ 
tion depend on the parameter. We shall assume (cf. T21]) that X x , • • • , X n 
are distributed with probability density 

( 4 . 10 ) V{X) ~W) “ ° ~ X ~ H9) 

where 6 is a strictly decreasing continuous function over an interval [— oo, 
6(— oo)] and where 6[6(— «)] = — oo. These assumptions insure that there 
exists a unique number a, — <» < a < 6( — oo), such that 6(a) = a. 

Denote by Wa , (i, j = 1, • • • , n; z ^ j), the portion of the sample space 
where the smallest and the largest of the x’s are Xi and Xj respectively. Denote 
by Win and Wm those portions of Wa where is greater than and less than 
b~ x {Xj) respectively. For any region w denote by w’ iik the intersection of w with 



484 


E. L. LEHMANN 


Wijk . Consider a transformation carrying the sample-space into W Xn , letting 
2 /i = min (x x , • • • , x«), y n = max (xi , • • • , x„) and in IF,-,- letting y 2 , • • • , t/ n -i 
denote the remaining x’s in the order of their subscripts. Next make a trans¬ 
formation carrying W ln into W ln i , letting z x = max [ 2/1 , b~ l (y n )], z» = min 
[t/i, 6“ l (2/ n )] and z k = yn for A; = 2, • ■ • , n — 1. Denote by to*/* the image of 
in TFini. 

Then Z n is a sufficient statistic for 0 (cf. [21]) and there exist functions/ 1 , g% 
such that the density of Z n is given by 


(4.11) 


v r z \ ~ fA Zn ) 
P(p) 


m 


0 < Zn < o 


while the distribution of the remaining Z’s given Z n is independent of 6. 

The condition that w be a similar region may now be written, analogously to 
(4.5), in the form 


(4.12) 


f 9 fl(*n) f 

•'a Ql{0) i,j,k 




p(zi 


, 2„_1 | Zn) dZi 


dz, 


n ~' dZn = € l m dZn 


and hence by the argument which led to (4.6), as 


(4.13) £ / p(2i, • • • , 2 n _i I *„) <fei • • • = e for all z„. 

i,j,k 

Thus in this case also Neyman’s theorem gives the most general similar region. 

For the case that neither extreme of the range of the distribution depends on 
the parameter 0, it has been shown by various authors [22, 21, 23] under slightly 
varying assumptions concerning the regularity of the distribution function, that 
the existence of a sufficient statistic implies 

(4.14) p(x | 6) = exp [P(0) + T(x)Q(9) + P(x)]. 

This (cf. [10]) is a special case of that for which Neyman and Pearson determined 
the totality of similar regions, however under the restriction that the moments 

of $ = ~ log p(Xi) uniquely determine the distribution of $. We shall 

du t-i 

t 

briefly indicate how this assumption may be avoided. 

Let Xi , • • • , X n be a sample from (4.14), or, more generally, (this is the case 
considered by Neyman and Pearson), let X x , • • • , X n be distributed with prob¬ 
ability density 

p{xi , • • • , Xn) 

(4.15) 

= exp [P(0) + u(x 1 , • • • , x n )Q(0) + v{xi , • • • , Xn)] 

in a sample space W+ which is independent of 0. We shall assume that the set 
of values which Q takes on contains at least some interval. Introducing 6 = 



OPTIMUM TESTS 


485 


—Q(0) as a new parameter, we shall obtain all regions similar to 5 (where the 
set of values of 5 contains an interval) for the distribution 8 


(4.16) 


p(X] 


i > 


,*n) 


= exp [pi(a) - 5 • u(x x , 


Xn) + v(x : 


1 } 


*»)] 


under the assumption that 



0 except possibly on a set of measure 0. 


Let us for a moment assume that there exist functions ••• , x«), 

[i = 2, • • • , n), with continuous partial derivatives almost everywhere and such 
that the transformation 


(4.17) yi u(x i, • • • , Xn )) y% — f%{%i > * * * » x n ) 9 {i — 2, • • • , n)y 

is one to one on W + except possibly on a subset of measure 0. Applying this 
transformation we may write the condition of similarity in the form 

" y n ) dy t • •. dy n -d yi 

= « r f f(y ,,.. •, y n ) dy t • • • dy n -d yi 

J-co JW{V x) 


L 

(4.18) 


where W(yi) denotes the region of variation of y 2 , • ■ • , y n given y\ , and where 
w(yi) denotes the region of variation of y 2 , • ■ • , y n given */i and w. Furthermore 
f(yi , • * * , 2/n) is independent of S. From the theory of bilateral Laplace trans¬ 
forms it is known that (4.18) implies that 


(4.19) / f(y t , • • • , y n ) dy 2 ---dy n = € f(y x , • • • , y n ) dy 2 • • t dy n 

which is the desired result. 

More generally it may be shown that our assumption concerning u(x\ , • • • , x n ) 
insures the existence of functions/*, (i = 2, • • • , n), such that under the trans¬ 
formation (4.17) no point (y x ,• • • , y n ) has more than a denumerable infinity of 
counter images in x-space. Our proof can be modified to cover this case. The 
argument is similar to that used to obtain equations (4.9) and (4.13) which were 
also arrived at through many to one transformations. 


5. Testing exponential and rectangular distributions. In their fundamental 
1928 paper [24] on likelihood ratio tests, Neyman and Pearson discussed various 
hypotheses relating to normal, exponential and rectangular distributions. Later 
they and other authors developed a theory of similar and bisimilar regions which 
made it possible to obtain optimum tests of many composite hypotheses with 

8 An assumption that we can solve for 0 as a function of 6 is not needed since we can 
determine Pi (a) by integrating the density (4.16) over W+. 



486 


E. L. LEHMANN 


one constraint concerning normal populations. This theory however is not 
applicable to most hypotheses concerning exponential or rectangular distribu¬ 
tions. We shall in this section obtain optimum tests of some hypotheses relating 
to these latter distributions, using the method of the previous section. 

Let us first consider a sample Xi , • • • , X n from an exponential population, 
the probability density of the sample being. 

(5.1) p(x t , •• •, *») = i \ ~\ £ C*i - 6)1 if *i>b, (i = 1, •••,») 

a n L a t~i J 

and let us consider the two hypotheses Hiia = 0 , Hi\b = b Q where, without loss 
of generality, we shall take a 0 = 1, b Q = 0. The likelihood ratio tests of both 
these hypotheses were shown to be completely unbiased by Paulson [25]. We 
shall prove 

Theorem 5. The likelihood ratio tests of Hi and Ih are type Bi and uniformly 
most powerful , respectively. The one-sided tests based on the likelihood ratio criterion 
for H\ are the uniformly most powerful one-sided similar regions for testing this 
hypothesis. 

Proof. In order to simplify the argument we shall give a detailed proof only 
for the restricted class of tests which are symmetric in the variables X Y , • • • , X n . 

For testing Hi let us make the following transformation introduced by 
Sukhatme [26]: 


(5.2) 


Zi = nY\ 

Zi - (n - i + 1 )(Y< - F w ), 
where is the tth of the X’s in order of magnitude. Then 

p(*i > • •• , Z») = exp (zi - nb) - \ X) zA 
a" L o a <-> J 


(*' = 2, • • • , n), 


(5-3) 


if 2 i > rib; > 0 (t = 2, • • • , n). 


We want to determine all regions w which under H are similar, to the sample 
space with respect to 6, i.e. all regions w satisfying 

j e ~ (tl ~ nb) exp £ — 23 J ds 2 • • • dz n dzi 

= [ e~ itl ~ nb) ( [ exp [" — 23 dzi • • • <&„} dzi 


(5.4) 


(&) ( b ) 

S € 


to; 

= < / C 

Jnb 




dzi 


where 10 ( 21 ) denotes the intersection of w with the hyperplane Zi = Zi. Now 
(5.4) is equivalent to 



(5.5) 

where 


OPTIMUM TESTS 


487 


■f. 

•>«» 


<») 

«~*7(*i) *1 a 0 


(5.6) 


/(*i) = f exp f - J2 zA dzt • • • dz n - 
L <-2 J 


and this in turn is equivalent to 

(5.7) /(zi) = 0 for all Zi . 

Of all the regions w satisfying (5.7) we want to determine the one which against 
a specific alternative, say ai, has maximum power, i.e. for which 


(5.8) 


/' 


-tl/aj) <#!-»« 


f exp — - 23 2* dz 2 • • • dz n dz\ 
Jwin) L ®1 *—2 J 


is as large as possible. We thus see that w will have the desired properties if 
w(zi) is determined according to the two conditions 


(5.9) 
and 

(5.10) 


/ exp — 23 dz * ‘fen 

L «-2 J 

f exp — - 23 Zi \dzi • • • dz n = 

• Wl > L «1 —2 J 


max. 


Hence by the Neyman-Pearson fundamental lemma w(z{) is the set of points 
satisfying 

(5.11) exp |Y — + 23*.) 1 > C(a t , zO 

L\ Ol *-2 »-2 / J 

and therefore according as a\ is greater or less than 1, to(zi) is determined by 


(5.12) 


23 = 23[x< - min (x u •••, x„)] > k(a lt «,), or 

«-2 i-1 


n » 


23«< = 23fr< - min (*i, ••• , a;*)! 

»—2 <—1 


< 


/c'(ai, zi). 


n 

But 23 Zi is independently distributed of Z\ and under H the distribution of 

t-2 

n 

23 Zi does not depend on ai, in fact it is a chi-square distribution with 2n — 2 

i«-2 

degrees of freedom. Thus k and k f , as determined by (5.9) are independent of 
ai and the two tests (5.12) are uniformly most powerful one-sided. 

Next we consider the more restricted class of unbiased similar regions. For to 
to be unbiased we must have 



488 


E. L. LEHMANN 


i {t'L ex » [ - Z± T^] exp [ - 5§ *] dz '-" *»} | - 1 

(5.13) = / (21 — nb — n) exp [— (21 — n&)] / exp — X) 2,- c/2 2 • • • d2 n 

+ / exp [ — (21 — n&)] / S2J exp — z» u/22 • • • dz n dz x — 0. 
Jnb J u»(*i) \t—2 / L t—2 J 

The first of the integrals in the middle member equals 

/ (2 — n) e ~* / exp - E • • • d2 n d2 

•'O ■'«?(*-}-«&) L <—2 J 

- € (2 — ft) e“* c/2 = — (n — l)e. 

Jo 

Therefore 

f>‘- f (i z.) exp [" - E 2< 1 cb t •••<&„ <fc t 

Jn6 Jw(iH) \t—2 / L *-2 J 

= (n - 1 )< = (n - 1 )« f rf Zl 

Jnb 


(5.14) 


(5.15) 


or 

(5.16) 

where 


/*oo (6) 

/ e~‘ l ftei) 'hi = 0 

Jnb 


(5.17) 0 ( 21 ) = f exp | — ^ 2 t ] dz 2 •• • dz n — (ft — 1)*. 

\t-2 / L i-2 J 

Thus finally the condition of unbiasedness reduces to 

(5.18) / () exp - c/ 2 * = (n — l)e 

\i-2 / L t'-2 J 

and we seek the region w(zi) which satisfies (5.9), (5.10) and (5.18). 

By the fundamental lemma w(zi) is given by 

(5.19) exp 1 Ez<J > j^C*i( a i 1 21 ) £ z.' + Ci(a lt 2l )J • exp Ez<J 
which is equivalent to * 

n 

(5.20) E *< < Mai, 21), > Mai, 2l ) 

i-2 


where A* and fa are determined by (5.9) and (5.18), and are therefore independent 
of Z\ and a. Thus the region (5.20) which of all unbiased similar regions 



OPTIMUM TESTS 


489 


maximizes the power against the alternative a = a x is independent of a x and 
hence is a region of type B x . This completes the proof since it is easily verified 
that (5.10) is equivalent to the likelihood ratio test. 

The proof for regions which are not necessarily symmetric in the variables 
follows similarly if instead of the transformation (5.2) one uses a transformation 

n 

Ui = fi(X i, • • • , X n ) which is one to one and such that U x = Z x , U 2 = ]£ . 

t—2 

The distribution of U z , • • • , U n is then independent of a and b since U x , U 2 
are a pair of sufficient statistics for these parameters, and the proof carries over 
step by step. 

Next we consider the hypothesis H*:b — 0, and again we restrict ourselves to 
regions which are symmetric in the variables, although as before the proof can 
be modified to cover also nonsymmetric regions. 

We first make the transformation to Z x , • • • , Z n given by (5.2). In the 
n — 1 dimensional space of Z 2 , • • • , Z n , we then transform to new variables 

n 

U,'& i, • • • , ¥«-2 where U = ]£ Z* and where the ^’s are the generalized polar 

i-2 

angles. Obviously the distribution of the ^’s does not depend on a, since they 
are homogeneous of degree 0 in the Z’s. Furthermore the ^’s are independently 
distributed of U since the probability density of the Z’s is constant over the 
hyperplanes U = u. Thus 

, x K 

p(z 1, U,'h> * * * , tn- 2 ) = ~n 

U n ~ 2 e~ U,a p(h, * • • , 2 ). 

We next introduce new variables 


[ z x — nb 

- s-j 


(5.22) V = Zx + U and T = 

6\ “h U 

and find 

» 

p(v, t, in, , in-i) = ~n exp £ - —j V n ~' (1 - <)”~ S p(in, ■■■ , in-i) 

(5.23) 

for v > nb, — < t < 1. 
v ~ 

For w under H% to be similar with respect to a, we must have 

J expf — -1 v*" 1 £ (1 - t) n ~ 2 p(fi, • • • ,fn-*)dtd+ x • • • ctyn-.* • cfo 

(5.24) rif r »i 

= o _ " exp L"d , ' dv 



490 


E. L. LEHMANN 


where w(v) designates the intersection of w with the hyperplane V = v, and 
where w 0 (v) denotes the part of w(v) lying between the hyperplanes t = 0 
and t = 1. 

Hence the condition of similarity may be written as 

(5.25) J exp £ — ^J v n ~ 1 f(v ) dv — 0 for all a > 0 

where 

(5.26) f(v) = [ (1 — 0 W_2 P(^ 1 J * • # fn- 2 ) ^ dfa • • • #n -2 “ *• 

Jwq(v) 

By the uniqueness theorem for Laplace transforms, (5.25) implies /(t>) = 0 
for all v > 0, so that the condition of similarity finally reduces to 

(5.27) f (1 - t) n ~ 2 pttu ••• . s) df #» • • • #n-2 = «• 

•'teo'*) 

Of all similar regions, let us find the one which has maximum power. Obvi¬ 
ously we want to include in w(v) all points for which t < 0. In addition we want 
to choose w 0 (v) such that 


(5.28) 



(1 — t) n 2 p(\p 1 , • •• , ^„_ 2 ) dt #1 • •• # A -2 


max 


where Wb(v ) is that part of w(v) in which max 



< t. 


If, for some alternative b y w Q (v) is contained in — < t < 1, then i»b(v) and 

Wo(v) coincide and hence (5.28) attains its maximum value € whatever the posi- 

Tib Tib 

tion of Wq(v) in — < l < 1. If on the other hand — is so close to 1 that 

V I? 

w5 • 

— < t < 1 is too small to contain t^o(v), then (5.28) attains its maximum for 

Tib 

any w 0 (v) containing — < t < 1. There exists therefore a unique w 0 (v) which 
maximizes (5.28) for all values of b and v , namely the region defined by 


(5.29) C(v) <t< 1 

where C is determined by (5.27). 

Since under H 2 , the statistics V and T are independent, C does not depend 
on v. The test 


(5.30) t < 0, > C 

which we have just shown to be uniformly most powerful, is also the likelihood 
ratio test which completes the proof of the theorem. 

We shall finally consider an example of an optimum test in connection with a 



OPTIMUM TESTS 


491 


rectangular distribution. Let Xi , • • • , X n be independently and uniformly 
distributed over (a, a + 0), where 0 is positive. For testing the hypothesis 
H: a = a Q , the test 

(5-31) £— 7 ? < 0, > C 

where Y\ and Y n are the smallest and the largest of the X’s respectively , is the uni - 
formly most powerful of all similar regions. 

The proof of this goes through very much like that for H* in Theorem 5. 
Without loss of generality we take Oo = 0. Also again, to simplify the proof, 
we restrict ourselves to regions which are symmetric in the variables. We need 
the following lemma. 

Lemma. Let X \, • • • , X n be independently and uniformly distributed over 
(a, a + 0). Let Yi denote the zth X in order of magnitude , and let 

(5.32) T n = Y n , T k = ( = 1, ••• , n - i). 

Then for a > 0 

(5.33) p(h ,•••,<«) = n ~<» -1 t”Z\ k 
when 

a < t n < a + 0, ——-—;— < t k < 1, {k = 1, • • • , n - 1). 

tn'tn-l * ‘ * tk+1 

This is easily seen by applying the usual method of Jacobians. The inequali¬ 
ties describing the sample space of the V s are equivalent to the following more 
convenient ones: 

(5.34) a<t n <a + e,~ </ife ••• «,_i < 1;<* < 1, (*-1, •••,»-1). 

In 

Let us denote by w(t n ) the intersection of a region w with the hyperplane 
T n = tn , and by wo(tn) that part of w(t n ) contained in the cylinder 0 < tk < 1, 
(fc = 1, • • • , n — 1); then we find as a necessary and sufficient condition for 
w to be similar with respect to 0 (assuming H) 

(5.35) (n - 1)! f CZltill • • • <, cU n -i ••• dh = «. 

Of all regions satisfying (5.35) we want to find the most powerful one. Let 
us first consider alternatives a > 0. If w a (t n ) denotes the common part of tt>o(0 
and the region 

(5.36) f ^ 

tn 

we must choose w a (tn) such that 



492 


E. L. LEHMANN 


(5.37) / till • t n n I* 2 - • • t 2 dtn -1 • • • dti = max. 

From this it follows easily that against alternatives a > 0 the uniformly best 
choice for w 0 (t n ) is 


(5.38) 


£l<2 * * ‘ In -1 — > C'(0, 

Vn 


and since under H, 


Yi 

Y n 


is independently distributed of T n 


C'(t„) docs not depend 


on t n . 

Consider next alternatives a < 0. We include in the region of rejection all 
points for which Y\ < 0. To determine w 0 (O we notice that, given Y\ > 0, 
the X’s are uniformly distributed between 0 and a + 0. (Provided a + 0 > 0; 
the case a + 6 < 0 is trivial). Kence the probability distribution of the T s 
given Yi > 0 is 


(5.39) 


p{tIf * * * y tn | Y l > 0) 


n! „_i 
(a + ey ln 


u 


when 


0 < t n < a + 0, 0 < t k < 1 for k = 1, • • • , n - 1. 

Thus 

/jr 4m P(tl> , tn—1 [ l n , a < 0, Fj > 0) 

p(h , • • • , £«-l 1^,0== 0) 

is independent of h , • • • , t n -1 and hence the power of w against alternatives 
a < 0 is independent of the choice of w 0 (t n ) . Therefore the region 

(5.41) yx<Q, y ->C’ 

y* 

is uniformly most powerful against all alternatives. But (5.41) is equivalent to 

(5.42) —< 0, > C. 

y n - yi 

It is interesting to compare this result with that for the corresponding simple 
hypothesis. Let H ' be the hypothesis: a = 0 when the are assumed inde¬ 
pendently and uniformly distributed over (a, a + 1). There exists no uniformly 
most powerful test of H'\ instead the two uniformly most powerful one-sided 
tests exist. By analogy with the normal case one might then expect for H' 
that of all tests with symmetric power-functions, there be a uniformly most 
powerful one. This however is not so: there exist infinitely many admissible 
tests with symmetric powerfunction. 



OPTIMUM TESTS 


49a 


In this and the previous section we restricted ourselves to problems involving 
only one nuisance parameter. However, the method applies also to problems 
involving several nuisance parameters. 

In the usual way (cf. [20, 9]) the results of this section may be translated to 
give optimum sets of confidence intervals for estimating the parameters in ques¬ 
tion. In this connection it is an open question whether the confidence regions 
based on the type B\ tests discussed in section 2 will always be intervals; one 
would expect this to be the case. 

The author wishes to acknowledge his indebtedness to Professor P. L. Hsu 
for many helpful suggestions. 

Added in proof: In a joint paper by Professor Henry SchefF6 and the present 
author which has been submitted to the Proceedings of the National Academy of 
Sciences, a result is given concerning the existence of certain 1:1 transformations. 
This result bears on Section 4 of the present paper where a question arises con¬ 
cerning the existence of a 1:1 transformation. The existence of such a trans¬ 
formation is now assured and, as a consequence, the last paragraph of Section 4 
has become superfluous. 


REFERENCES 

[1] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statistical 

hypotheses”, Roy. Soc. London Phil. Trans., Ser. A, Vol. 231 (1933), p. 289. 

[2] J. Neyman, “Sur la verification des hypotheses statistiques composdes”, Bull. Soc . 

Math. France, Vol. 63 (1935), p. 1. 

[3] J. Neyman, “On a statistical problem arising in routine analysis and in sampling in¬ 

spection of mass production”, Annals of Math. Slat., Vol. 12 (1941), p. 46. 

[4] H. Scheff£, “On the theory of testing composite hypotheses with one constraint”, 

Annals of Math. Stat., Vol. 13 (1942), p. 280. 

15] P. L. Hsu, “Analysis of variance from the power function standpoint”, Biomeirika , 
Vol. 32 (1941), p. 62. 

[6] J. B. Simaika, “On an optimum property of two important statistical tests”, Bio¬ 

meirika, Vol. 32 (1941), p. 70. 

[7] A. Wald, “On the power function of the analysis of variance test”, Annals of Math. 

Stat., Vol. 13 (1942), p.434. 

[8] P. L. Hsu, “On the power function of the E 2 -test and the T 2 -test”, Annals of Math. 

Stat., Vol. 16 (1945), p. 278. 

[9] H. Scheff£, “On the ratio of the variances of two normal populations”, Annals of 

Math. Stat., Vol. 13 (1942), p. 371. 

[10] J. Neyman, “L’estimation statistique traitde oomme un probRme classique de prob¬ 

ability”, Actuality Sci. et Ind. No. 7S9, Hermann et Cie, Paris, 1938. 

[11] A. Wald, “Notes on the theory of statistical estimation and testing hypotheses”, 

Columbia University. 

[12] W. J. Dixon, “Further contributions to the problem of serial correlation”, Annals of 

Math. Stat., Vol. 15 (1944), p. 119. 

[13] J. von Neumann, “Distribution of the ratio of the mean square successive difference 

to the variance”, Annals of Math. Stat., Vol. 12 (1941), p. 367. 

[14] M. Kac, “A remark on independence of linear and quadratic forms involving independ¬ 

ent Gaussian variables”, Annals of Math. Stat., Vol. 16 (1945), p. 400. 

[15] W. G. Madow, “Note on the distribution of the serial correlation coefficient”, Annals 

of Math. Stat., Vol. 16 (1945), p. 308. 



494 


E. L. LEHMANN 


[16] R. L. Anderson, “Distribution of the serial correlation coefficient”, Annals of Math. 

Stat ., Vol. 13 (1942), p. 1. 

[17] T. Koopmans, “Serial correlation and quadratic forms in normal variables”, Annals of 

Math. Stat., Vol. 13 (1942), p. 14. 

[18] R. B. Leipnik, “Distribution of the serial correlation coefficient in a circularly cor¬ 

related universe”, Annals of Math. Stat., Vol. 18 (1947), p. 80. 

[19] P. L. Hsu, “On the asymptotic distributions of certain statistics used in testing the 

independence between successive observations from a normal population”, 
Annals of Math. Stat., Vol. 17 (1946), p. 350. 

[20] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability”, Roy. Soc. London Phil. Trans., Ser. A, Vol. 236 (1937), p. 333. 

[21] E. J. G. Pitman, “Sufficient statistics and intrinsic accuracy”, Proc. Camb. Phil. Soc., 

Vol. 32 (1936), p. 567. 

[22] B. O. Koopman, “On distributions admitting a sufficient statistic”, Trans. Amer. 

Math. Soc., Vol. 39 (1936), p. 399. 

[23] D. Duauri, “Application des propridt^s de la limite au sens du calcul des probability 

h l’dtude des diverses questions d’estimation”, J. Scole Poly., Vol. 3 (1937), 
p. 305. 

[24] J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria 

for purposes of statistical inference”, Biometrika , Vol. 20A (1928), p. 175. 

[25] E. Paulson, “On certain likelihood ratio tests associated with the exponential distribu¬ 

tion”, Annals of Math Stat., Vol. 12 (1941), p. 301. 

[26] P. V. Sukhatme, “Tests of significance for samples of thex* population with two degrees 

of freedom”, Annals of Eugenics, Vol. 8 (1937), p. 52. 



A CORNER TEST FOR ASSOCIATION 
By Paul S. Olmstead and John W. Tukey 
Bell Telephone Laboratories and Princeton University 

1. Summary. This paper proposes a new test (the “quadrant sum”) for 
the association of two continuous variables. Its notable properties are: 

(1) Special weight is given to extreme values of the variables. 

(2) Computation is very easy. 

(3) The test is non-parametric. 

Significance levels (for the quadrant sum) are given to the accuracy needed for 
practical use. To this accuracy they are independent of sample size (see Fig. 1). 
The generating function of the quadrant sum is given for the null hypothesis 
(no association = independence). A limiting distribution is deduced and com¬ 
pared with the cases 2 n = 4, 0, 8, 10, and 14. Extension to higher dimensions 
and application to serial correlation are discussed. 

2. Description of test (even number in sample). We shall describe the 
test as though a scatter diagram had already been drawn. The possibilities of 
direct computation from tabular data are indicated by the examples in sections 
8 and 9. 

In the scatter diagram, draw the two lines, x = x m , y = y m , where x m is the 
median of the x-values without regard to the values of y , and y m is the median 
of the t/-values without regard to the values of x. Think of the four quadrants 
or corners thus formed as being labelled +, —, +, —, in order, so that the upper 
right and lower left quadrants are positive. Beginning at the right hand side 
of the diagram, count in (in order of abscissae) along the observations until 
forced to cross the horizontal median. Write down the number of observations 
met before this crossing, attaching the sign + if they lay in the + quadrant, 
and the sign — if they lay in the — quadrant. Repeat this process moving 
up from below, moving to the right from the left, and moving down from above. 
The quadrant sum is the algebraic sum of the four terms thus written down. 
This process is illustrated in Fig. 2, where the black dots represent contributions 
to the sum, and the dotted lines, crossings. 

When there are an even number of pairs (x, y) and no ties, the medians will 
pass between the points. In this, the simplest case, the distribution of the 
quadrant sum is known for the hypothesis of no association (that is, of inde¬ 
pendence), and significance levels are given in Table 1 for the magnitude (abso¬ 
lute value) of the sum. It will be noticed that the sample size does not enter in 
any important way. 

The cases of an odd number of observations and of ties are discussed in the 
next two sections. Simple devices make the test usable in most cases. A very 
great tendency toward ties, however, will make it inapplicable: This will be 

495 



PAUL S. OLMSTEAD AND JOHN W. TUKEY 






8 

CM 


Ainisveoud 


ABSOLUTE VALUE OF QUADRANT SUM 









CORNER TEST FOR ASSOCIATION 


497 


unimportant in most applications because of the fact that attention is being 
directed to the periphery. 


INDIVIDUAL TERMS 
TOP « +3 
RIGHT » +1 
BOTTOM «+6 
LEFT *+6 */2 


OUAORANT SUM 
IS) * 16 '/2 
P fS O.S% 



Fig. 2. Scatter diagram of 116 pairs of observations 


The set of data which prompted the development of the test is shown in Fig. 2. 
The accompanying report described it as follows: “The various points appear 
to be scattered almost completely at random and give little indication of corre¬ 
lation.” The quadrant sum is 16which is significant at the 0.5% point. 
Intuitively, the significant association of the peripheral points is clear. 




498 


PAUL 8. OLMSTEAD AND JOHN W. TUKEY 


3. Description of test (odd number in sample). If the sample size is odd, 
then we may usually follow the process outlined above. We will have difficulty 
only when the counting process meets a point, one of whose coordinates is a 
median. In this case we employ a simple device, namely: 

Given a sample of 2n + 1 pairs, let x* and y* be the medians of the ^-values 
and of the y-values, respectively. Let the pairs in which they occur be (x* f yk) 
and (x m , y*), respectively. Replace these two pairs by the sing'e pair ( x m , y*). 
There are now 2r„ pairs and the regular method can be applied. 

The quadrant sum so obtained from an unassociated population has the same 
distribution as that formed directly from 2 n pairs. 

4. Description of test (treatment of ties). The behavior of the test is known 
when (1) there is no association, (2) the probability of a tie in x-values or y-values 

TABLE 1 


Working significance levels for magnitudes of quadrant sums 


Significance level (Conservative) 

Magnitude of quadrant sum’" 

10% 

9 

5% 

11 

2% 

13 

1% 

14-15 

0.5% 

15-17 

0.2% 

17-19 

o.i% 

18-21 


* The smaller magnitude applies for large sample size, the larger magnitude 
for small sample size. Magnitudes equal to or greater than twice the sample 
size less six should not be used. 


is zero. The following approximation, which has an unknown effect on the 
distribution, is suggested when tics are present: 

When a tied group is reached, count the number in the tied group favorable 
to continuing and the number unfavorable. Treat the tied group as if the 
number of its points preceding the crossing of the median were 

number favorable 
1 + number unfavorable* 

It seems likely that this approximation is conservative. 

0. Discussion. When a moderate number, say 25 to 200, of paired observa¬ 
tions en two quantities are plotted as a scatter diagram, visual examination 
frequently detects what seems to be definite evidence of association between 
the variables. Often in such cases, the usual methods for measuring associa- 



CORNER TE8T FOR ASSOCIATION 


499 


tion do not find statistical significance of association. Visual judgment, par¬ 
ticularly by engineers or scientists who may wish to take action on the basis 
of their findings, gives greater weight to observations near the periphery of the 
scatter diagram. This is not always desirable—but often it is very desirable. 
A quantitative test of association with such concentration on the periphery has 
been lacking. The quadrant sum test was developed to fill the gap. Its fea¬ 
tures of speed and non-parametricity are useful but secondary from this point 
of view. 

When uniform attention to the whole scatter diagram is desired, the quadrant 
sum test is of unknown usefulness. We know little enough of the operating 
characteristics of the more conventional tests, such as: 

1. The product moment correlation coefficient 

2. The four-fold table formed by the medians 

3. The biserial correlation coefficient 

4. The rank correlation coefficient 

and less about the operating characteristics of the present test. In this case, 
the quadrant sum test can only be recommended definitely for exploratory 
investigations of large amounts of data. 

There are many situations, however, where we do not know where to concen¬ 
trate our attention, and where speed and non-parametricity are cardinal virtues 
in a test. One example is the use of serial correlation in studying industrial 
processes. We may guess that here we are interested in the periphery, but 
neither theory nor experience can, so far, prove this. In such situations the 
quadrant sum is by far the fastest to use of any of the tests known to the authors, 
and we believe one of the most useful. 

6. Elementary derivations. We can easily find the distribution of 

1. An individual term of the quadrant sum 

a. For fixed sample size 

b. In the limit 

2. The quadrant sum itself 

a. For fixed sample size 

b. In the limit, assuming asymptotic independence of the four terms. 

This we shall do now, leaving the proof that 2a actually converges to 2b to a 
later section. 

Consider a sample of 2n pairs (zi, yi), • • • , fen , yu) from a population in 
which x and y are independent. It is both clear and easily verifiable that 

1. The set of 2n x-values, X \, • • • , x% n 

2. The set of 2n y-values, yi , • • • , yu 

3. The permutation of the order of the y-values when the pairs are ordered 

by the x-values 

which together determine the sample, are independently distributed, and that 
any permutation is as likely as every other. (We have assumed no ties, which 
is a consequence, with probability one, of the continuous cumulative distribu- 



500 


PAUL S. OLMSTEAD AND JOHN W. TUKEY 


tions of x and y). Since the quadrant sum depends only on the permutation, 
its distribution in the absence of association does not depend on the distribu¬ 
tions of x and y. 

We must solve, then, certain purely combinatorial problems—under the 
hypothesis that the 2n\ permutations of the y-values are all equally likely. 
It may simplify matters to assume that the values of x in the sample are 1,2, • • • , 
2 n and that those of y are the same. How, then, do we calculate the distribu¬ 
tion of a single term of the quadrant sum. Let us begin with small rr-values, 
and the pair (1, 2/i). If 2/i = 1 , 2, • • • , n, we count “one” positive, and if 
yi = n + 1, n + 2, • • • , 2n, we count “one” negative. We pass on to (2, y 2 ) 
and so on. How many permutations yield a count of exactly k positive values? 
Those in which yi , y 2 , * • • , yic are equal to or less than n, yk+i equal to or greater 
than n + 1, and the other (2 n — k — 1)^8 are arbitrary. There are: 

n(n — 1) • • • (n — k + 1) • (n)(2n — k — 1)! 

such permutations, the fraction of all (2 n )! permutations being: 

.-v n(n — 1) • • • (n — k + 1 )n 

{ l) (2n)(2n - 1) ... (2 n - k + 1)(2 a - k) 

which is, then, the probability that this contribution will equal +k , or by sym¬ 
metry, the probability that it will equal —k y k Q. 

For large n, this becomes merely: 

(2) pi = 2 -<l * l+1> , k * 0. 

In order to obtain the distribution of the quadrant sum itself, we must concern 
ourselves with the lack of independence of the four terms. This is indicated 
most clearly in the case of 2 n = 2, where the 21 = 2 permutations yield 
+1 +1 +1 +1 = 4 and —1—1—1—1 = —4. Here, there is complete lack 
of independence. We shall see later that there is effectively independence in 
the limit, so that it is worth while to calculate the sum of four independent 
terms with the limiting distribution (2) and find that it satisfies: 

(3) Pr (I mdependent sum of 4 terms | >k) =- ^ -, k > 0. 

The details will be omitted. 

A simple device, reminiscent of Wald’s [3, 1943] establishment of the two- 
dimensional tolerance limits enables us to avoid difficulties with lack of inde¬ 
pendence and compute the exact distribution of the quadrant sum for any n. 
We decompose the permutation of the y-values into the following parts, which 
together specify the permutation: 

(a) The number, j, of pairs in the upper right quadrant/ 

(b) The set of j values of x between n + 1 and 2n corresponding to pairs in 
the upper right quadrant. 



CORNER TEST FOR ASSOCIATION 


501 


(c) The set of j values of y between n + 1 and 2 n corresponding to points 
in the upper right quadrant. 

(d) The set of j values of x between 1 and n corresponding to pairs in the 
lower left quadrant. (Note that the use of medians ensures that the 
lower left and upper right quadrants contain the same number of points.) 

(e) The set of j values of y between 1 and n corresponding to pairs in the 
lower left quadrant. 

(f) The permutation of j objects defined by the pairs in the upper right 
quadrant. 

(g) The permutation of n — j objects defined by the pairs in the upper left 
quadrant. 

(h) and (i) the permutations from the remaining quadrants. 

It is easily verified that: (1) given j, items (b) to (i) can be assigned at will, (2) 
each assignment of (a) to (i) corresponds to one and only one permutation, (3) the 
quadrant sum depends only on items (b) to (e). In fact, the right hand term 
depends on item (b), the upper term on item (c), the left hand term on item (d) 
and the lower term on item (c). While j remains fixed, the terms behave 
independently. 

For fixed j 9 what is the distribution of a single term? If a set of j ^-values 
gives the term +k, it must contain the k largest z-values and not contain the 
next. There are: 



such sets. The generating function for a single term, is, then: 


(4) 






Since the terms are independent for fixed j , and there are O’0*(( n "* j)0* 
ways to supply the permutations forming items (f) to (i), the generating func¬ 
tion for the quadrant sum, S n , is: 


(5) Qn ( x )=± 

J-O 


(iQ 2 ((n - j)!) 2 
(2n)! 





The exact probability of equalling or exceeding each value of S n has been 
computed for 2ft = 2, 4, 6, 8, 10, and 14. Table 2 gives these probabilities 
and Fig. 3 shows the values of 


^ + logio Pr( | quadrant sum | > m) 

5 

this particular function being chosen for its relative constancy. The maximum 
value of the quadrant sum is 4ft, and for values of k less than 4n — 6, there 



TABLE 2 

Probability of a Sum of Absolute Value Equal to or Greater than k when a Sample 
of 2n is Drawn from an Unassociated Population 


\ 2n 

i*i N \^ 

2 

4 

6 

8 

10 

12 

14 

oo * 

0 

1.0000 








1 

1.0000 


0.9333 

0.9036 

0.9106 


0.9115 

0.912037 

2 

1.0000 

w 

0.7556 

0.7544 

0.7567 


0.7580 

0.754630 

3 

1.0000 

0.4167 






0.599537 

4 

1.0000 

0.4167 

0.4607 

0.4619 



■sis 

0.462963 

5 

0.0000 

0.3333 

0.3111 


0.3519 


0.3547 

0.346933 

6 

0.0000 

0.3333 

0.2222 

0.2619 



0.2611 

0.252025 

7 

0.0000 

0.3333 

0.1556 

0.1821 

0.1867 


0.1876 

0.177662 

8 

0.0000 

0.3333 

0.1111 

0.1258 




0.121817 

9 

0.0000 

0.0000 

0.1000 


0.0928 


0.0918 

0.081471 

10 

0.0000 

0.0000 

0.1000 

0.0554 

0.0642 



0.053295 

11 

0.0000 

0.0000 

0.1000 

0.0375 

0.0436 


0.0432 

0.034189 

12 

0.0000 

0.0000 

0.1000 


wmm 


0.0296 

0.021557 

13 

0.0000 

0.0000 

0.0000 




0.0202 

0.013386 

14 

0.0000 

0.0000 

0.0000 


0.0127 


0.0139 

0.008200 

15 

0.0000 

0.0000 

0.0000 


0.0095 


0.0096 

0.004963 

16 

0.0000 

0.0000 

0.0000 


0.0083 


0.0066 

0.002972 

17 

0.0000 

0.0000 

0.0000 


0.0079 


0.0045 

0.001762 

18 

0.0000 

0.0000 

0.0000 


0.0079 


0.0031 

0.001036 

19 

0.0000 

0.0000 

0.0000 


0.0079 


0.0021 

0.000604 

20 

0.0000 

0.0000 

0.0000 


0.0079 


0.0014 

0.000350 

21 

0.0000 

0.0000 

0.0000 




0.0010 

0.000201 

22 

0.0000 

0.0000 

0.0000 




0.0008 

0.000115 

23 

0.0000 

0.0000 

0.0000 




0.0006 

0.000065 

24 

0.0000 

0.0000 

0.0000 




0.0006 

0.000036 

25 

0.0000 

0.0000 

0.0000 




0.0006 

0.000020 

26 

0.0000 

0.0000 

0.0000 




0.0006 

0.000011 

27 

0.0000 

0.0000 

0.0000 




0.0006 

0.000006 

28 

0.0000 

0.0000 

0.0000 




0.0006 

0.000003 

29 

0.0000 

0.0000 

0.0000 




0.0000 

0.000002 


0.0000 

0.0000 

0.0000 


flEsB 


0.0000 

0.000001 

31 or over 

0.0000 


0.0000 

0.0000 

0.0000 


0.0000 

0.000000 

Variance 









of & 

16 

24 

26$ 

26$ 

26$$ 

26A 

26 

24 


‘ Probability for 2n == <*>, k > 0, is given by 

9k? + 9 k 2 + 168 A; + 208 
216 • 2* 


502 













































CORNER TEST FOR ASSOCIATION 


603 


is quite good agreement between the curves for finite n and formula (3) at 
the practically significant percentage points. The situation for very small 



4 8 12 16 20 24 28 

ABSOLUTE VALUE OF QUADRANT SUM 


Fig. 3. Comparative relationships for finite and infinite sample sizes and 
normal approximation to the infinite sample size 

probabilities suggests a careful consideration of the limiting behavior of the 
quadrant sum distribution (see section 10). 





504 


PAUL 8. OLMSTEAD AND JOHN W. TUKEY 


The device for samples of 2n + 1 deserves a word of justification. If there 
is no association, the 2n + 1 y-values are randomly paired with the 2n + 1 
2 -values, and, in particular, the y-v alue paired with the 2 -median is randomly 
selected. If we pair it with the (randomly selected) 2 -value which was paired 
with the y-median we still have random pairing. The pairing of the 2 n pairs 
is random, although neither the 2 -values nor the t/-values make up a sample. 
The randomness of pairing is all that has been used in the discussions of this 
section. 

7. Extension to higher dimensions. The same ideas that underlie the quad¬ 
rant sum test for two variables may be extended in several ways to give tests 
for various types of association among three or more variables. Only one 
three-variable case will be discussed here, leaving further extension to the 
reader. 

Given three variables, 2 , t/, and z , and a sample of matched observations on 
these, it is clearly possible to use the simple quadrant sum test for two variables 
to investigate association between x and y separately, between y and z separately, 
and between z and x separately. If the Pearson coefficient of correlation were 
being computed and were found to be close to zero for each of these pairs, it 
would be assumed that there was no detectable association through the second 
moments. In a trivariate normal or Gaussian distribution, where the first and 
second moments determine the whole distribution, if there is independence be¬ 
tween the separate pairs of variables, there is no possibility of a three-way 
association. It is of some interest, however, to notice that a corner sum test 
can be devised that will measure the effect of such triple association in case it 
does exist. 

Consider the octants into which the three median planes for 2 , y, and z, 
respectively, divide the three dimensional scatter diagram and label the octants 
alternately plus and minus, in the manner suggested by Fig. 4. More precisely, 
an octant is counted as plus if an odd number, that is three or one, of the vari¬ 
ables are greater than the medians of the sample, and the remaining octants are 
labelled minus. It is clear that we may repeat the process of coming in along 
each axis passing from observation to observation as long as they remain in a 
region of fixed sign, and writing down as a contribution to the final or octant 
sum the number of such consecutive elements and the sign of the region in which 
they were found. There will be six terms rather than four, as was the case 
for the test based on quadrants, and so a new set of significance levels will be 
required. Table 3, following, lists the situation for a very large sample. 

The situation has been sketched for the case of 2 n triples. If there are 2n + 1 
triples, then we may have trouble with the medians again. However, a similar 
device works, except that we must agree on a last variable in order to form the 
synthetic triples uniquely. For example, consider the triples (m, 3,5), (9, m, 1), 
(12, 4, m), where m denotes the median. Taking the order in which the vari¬ 
ables are written, we get (12, 3, 5) and (9, 4, 1) as the synthetic triples. Other 



Fig. 4. Octant schematic—solid sections taken as positive 


TABLE 3 

Working significance levels for the magnitudes of the octant sum 


Significance Level 

Magnitude of Octant Sum* 

10% 

11 

5% 

13 

2% 

15 

1% 

16 

0.5% 

18 

0.2% 

20 

0.1% 

21 


* Computed for large samples only and based on normal approximation, see 
section 11 for discussion of this and higher dimensional cases. 

505 


606 


PAUL S. OLMSTEAD AND JOHN W. TUKEY 


orders would yield (9, 3, 6) and (12, 4, 1) or (9,3,1) and (12,4,5). This slight 
dissymmetry is not pleasing but should give no difficulty. 

8. Nongraphical example. The following example of 78 successive observa¬ 
tions of four variables shows how this test may be applied without plotting and 
how simple the computation still remains. The data concern a metallurgical 


TABLE 4 

Excerpt from Tippett’s Table 


Time T* 

Fuel F* 

Material M* 

Articles A* 

Duration D • 

1 - 

246 + 

1457 - 

1895 + 


2 - 

196 - 

2078 + 

2121 + 


3 - 

192 - 

1278 - 

1437 - 


4 - 

202 + 

1398 - 

1497 - 


5 - 

206 + 

1944 + 

1592 + 


6 - 

218 + 

1464 - 

1506 - 


7 - 

155 - 

1541 + 

1762 + 


8 - 

201 + 

1502 + 

1818 + 


9 - 

211 + 

1950 + 

1144 - 


10 - 

236 + 

1768 + 

1654 + 


etc. to 





’ 78 + 

185 - 

1536 + 

1442 - 


Median 

Median 

Median 

Median 

Median 

39.5 

199 

1474 

1588 

149.5 


* Location of observation relative to column median; + = above; — = below. 

Tippett’s correlations (based on lightly rounded data) 

r F M = + 0.243 

r F A = + 0.266 
Tma = + 0.681 
Tfm.a — + 0.088 
rpMA. - + 0.141. 

problem in mass production and are taken from L. H. C. Tippett, Table XXII, 
page 63 [2]. An excerpt from the data is given in Table 4 together with Tip¬ 
pett’s calculated correlations. This table also shows the preliminary marking 
of each individual measurement as above (+) for its variable, below (—), or 
on the median (0). From this table we see, for example, that increasing T con¬ 
tributes a term —3 to the quadrant sum for T and D. It is often desirable to 
prepare auxiliary tables to assist in computing the components of the quadrant 





CORNER TEST FOR ASSOCIATION 


507 


and hyperquadrant sums. Such a table is Table 5 for low values of Fuel (F—) 
arranged in consecutive ascending numerical order. The entries on this table 
for the five columns headed F, T, M, A, and D are directly comparable to the 
entries in Table 4. For example, F = 155 is — with respect to the fuel median 
and T = 7, —; M = 1541, +; A = 1702, +; D = 152, + . The double, triple, 
quadruple and quintuple headed columns contain simply the algebraic multi¬ 
plication of the signs in the appropriate T, M, A, or D columns. Thus, TM 
for F = 155 is —, MAD is +, and TMAD is —. The contribution to each 
quadrant or hyperquadrant sum is simply the count of the consecutive like 
signs from the top of a column. For column AD y we have 7 consecutive + 
signs and since the contribution is to FAD and F is —, the contribution in this 
case to the octant sum is —7. The results from the ten tables of which Table 5 

TABLE 5 


Sample Table for One Component of Quadrant and Hyperquadrant Sums. Low 
Values of Fuel (F-) 


Fuel F 

T 

Af 


D 

TM 

TA 

TD 

MA 

MD 

AD 

TMA 

TMD 

TAD 

MAD 

TMAD 

98 - 

+ 

— 

— 

_ 

— 

— 

— 

+ 

+ 

+ 

+ 

+ 

+ 

— 

_ 

135 - 

+ 

— 

— 

- 

- 

— 

— 

+ 

+ 

+ 

+ 

+ 

+ 

— 

— 

140 - 

— 

— 

— 

— 


+ 

+ 

+ 

+ 

+ 

— 

— 

— 

— 

+ 

146 - 

— 

_ 

— 

— 

+ 

+ 

+ 

+ 

+ 

+ 

_ 

— 

— 

— 

+ 

147 - 

+ 

+ 

- 

- 

+ 

- 

- 

- 


+ 

- 

- 

4- 

+ 

+ 

149 - 

— 

+ 

— 

— 

— 

+ 

+ 

— ! 

— 

+ 

+ 

+ 

— 

+ 

— 

151 - 

+ 

— 

— 

— 

— 

_ 

_ 

+ 

+ 

+ 

+ 

+ 

+ 

— 

_ 

153 - 

+ 

- 

+ 

— 

- 

+ 

— 

- 

+ 

— 

— 

+ 

— 

+ 

+ 

155 - 

— 

+ 

+ 

+ 

— 

— 

— 

+ 

+ 

+ 

— 


— 

+ I 



Contributions to Sums 

FT FM FA FD FTM FT A FTD FMA FMD FAD FTMA FTMD FT AD FMAD FTMAD 

-2 +4 +7 +8 +2 +2 +2 -4 -4 -7 -2 -2 -2 +4 +2 


is a sample are then carried to the summary computation shown in Table 6. 
The contribution from Table 5 is shown on line F—. The totals are computed 
and their probabilities of occurrence determined. 

9. Serial example. The following example, a sample of 144 observations of 
the thickness of inlay for relay springs cut consecutively from a single sheet of 
material, allows us to compare the resolution of the present test with that of 
the serial product-moment correlation. The data are from Shewhart [1, 1941, 
Table 1] and the serial correlations from lag 1 to lag 22 are from recent calcu¬ 
lations by Miss Dorothy T. Angell. The procedure for calculating the serial 
quadrant sums is similar to that for obtaining the sums for section 8. A table 
is prepared to show the observed consecutive order of the numerical values and 
each is identified as above (+), below (—), or on the median (0). This gives a 






TABLE 6 

Summary Computation Table for Quadrant and Hyperquadrant Sums 


T-1 ^ CO <M *-lC* rH CO cieo 

QVHdx 4-4- 4-4- 4-4- 4-4- 4-4- 


_ „ CO CO 

gJ 1+ 1 

„ T C*<M 

VJ - 1 I 
I +4-1 
a i 



Significant at 5%.... 
Significant at 1%.... 
Significant at 0.2%.. 

































































CORNER TEST FOR ASSOCIATION 


509 


table similar to one of the elements, say Fuel, in Table 4. Four computation 
tables similar to Table 5 are required, one for the equivalent of moving from 
the right, one from below, one from the left, and one from the top of a lag cor¬ 
relation scatter diagram. One table from each direction will take care of all 
lags. In the first, the marginal entries are the observed values listed in descend¬ 
ing numerical order. Opposite these are recorded from the previous table the 
signs associated with observations for each lag with respect to each entry. 
The second table would record the signs relating to the lags from the observed 
values arranged in ascending order. The third table would record the signs 
relating to leads from the observed values arranged in ascending order and the 
fourth, the signs relating to leads from the observed values arranged in descend¬ 
ing order. The sign of the contribution from each group is the algebraic product 
of the sign of the run and the sign of the marginal entries. The length of run 
is determined in the same way as in Table 5. Table 7 illustrates the procedure 


TABLE 7 

Relation of Lagged Observations to Median (+ m above, — — below) for Smallest Observations in Ascending Order 

25 

+ 

(+) 
+ 

+1 

• Contribution to Serial Quadrant Sum. 


Thick- 

Lag 

nest 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

ii 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

2 

_ 

~ 

+ 

_ 

- 

+ 

- 

+ 

- 

+ 

- 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

— 

8 

— 

- 

+ 

- 

4* 

- 

- 

— 

- 

+ 

- 

- 

+ 

- 

+ 

- 

+ 

- 

+ 

+ 

+ 

+ 




8 


— 

— 

+ 

- 

- 

— 

— 

+ 

+ 

- 

+ 

+ 

_ 

+ 

-1- 

— 

— 

+ 

+ 

— 

— 

— 

— 

— 

10 

- 

+ 

— 

— 

- 

- 

+ 

4* 

- 

+ 

+ 

B 

+ 

+ 



+ 

+ 



- 

- 

- 

- 

+ 

13 

_ 

- 

- 

- 

- 

+ 

+ 

+ 

- 

+ 

+ 

+ 

+ 

(+) 

(40 

(+) 

(+) 

(+) 

(+) 

(+) 

(+) 

(+) 

(+) 

(+) 

(+) 

17 


— 

— 

4* 

— 


+| 

— 

+ 

- 

+ 

— 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

17 

_ 

+ 

- 


+ 

+ 

+ ! 

+ 

+ 

- 

— 

4- 

+ 

- 

- 

- 

- 

- 

- 

+ 

- 

H 

+ 

- 

+ 

18 

- 

- 

— 


+ 

+ 

H 

+ 

+ 

~ 

+ 

4- 

— 

— 


+ 

— 

— 

— 

— 

~ 

n 

+ 

— 

4* 

0 


+3 

-2 


+1 

-1 

+3 

-1 

+ 2 

-5 

* 

-1 

2 

-1 

-3 


-2 

-1 

-3 

-3 

—2 

-2 

-2 

—2 

4-1 


of determining the contribution from lags associated with the observations 
arranged in ascending order. 

Two serial quadrant sums may be computed—a circular serial quadrant sum 
or a noncircular serial quadrant sum. Circular items arise from considering 
that the beginning of the set of observations is a continuation of the end in the 
same way that this assumption is made in computing circular serial correlation 
coefficients. In Table 7, circular items are shown in parentheses and are omitted 
in calculating noncircular sums. In the particular table shown, the count of 
the run lengths was identical for both types of sum, but in other cases this may 
not be the case. Since the serial quadrant sum is relatively insensitive to 
sample size, the noncircular serial quadrant sum has for all practical purposes 
the same distribution as the circular quadrant sum. The correspondence in 
this case between the serial correlation coefficient for each lag up to 22 and 
the respective values of the two types of serial quadrant sums is shown in Fig. 5. 










510 


PAUL 8. OLMSTEAD AND JOHN W. TUKEY 



+.8 

+.4 


L°Q. 


t -r 

B. CIRCULAR PRODUCT- 
MOMENT CORRELATIONS 


_ 


Oh 




I__ 


-.4 


10 


20 


30 


40 


LAG 


+40 

+20 

O 

-20 


UPV? 

o ° 




T-I- I -|- I 

C. NONCIRCULAR 
QUADRANT SUMS 


V 




10 


20 


30 


40 


LAG 


Fig. 5. Comparative performance on a serial (autocorrelative) example 

10. Convergence to the limiting distribution. We shall consider several 
chance sums. One of these is S , which has the limiting distribution discussed 







511 


CORNER TEST FOR ASSOCUTION 




'Ax) - (± 

V-1 


2~ li+a x < + Z 2- 

t-1 


(Hi) 

X 


•) 


The total probability assigned to ^ = — jfc — ^ 

so that there is nonzero probability that k L not defined ’ Tfc^tSdiTS^’ 

Sj I J Ua J I T 4 sum ltself ’ whose generating function is (5), and the fourth is the 
result of the same sort of curtailment applied to S Tt win k a * j T* 
S*,* and its generating function is " ' Wdl be deDoted 

This again corresponds to a total probability less than unity 

It is clear that jj| * 


and 


p r(Sn.k m m) < Pr(S n = m) 
Pr {S'„ = to) < Pr(5 = m). 


We shall soon show that 
(6) Pr(S„, t = m) = Pr(Sk = m) 

and this will imply that 

Ira Pr(S n = to) = Pr(5 = m ) 

which is the desired result. The implication runs as follows: given « we can 
choose k so large that s *» we can 

Pr(*S* defined] > 1 — «/3 

whence 


| Pr(S* = to) — Pr (S = m) | < e /3 
and then choose n so large that 
I !*(£».» = m) — Pr(/S* «= to) | < t/(24k + 6) 

for to = -4k, -44+1,--.,4* 

whence 


Pr(iS «4 defined) > 1 - «/3 - 1 . i _ 16ft + 3 

24* + 6 € - 1 25F +6 « 



512 


PAUL S. OLMSTEAD AND JOHN W. TUKEY 


and hence 

| Pros;.* = m) - Pros, = m) \ < ^ * | « 

this inequality holding automatically for | m | > 4 k. Hence, 

| Pr(5„ = m) - Pr(S = m) | 

< | Pr(5„ = m) - Pr(&,.* = m) | + | Pr(,S„, t = m) - Pr(Sl = m) | 

+ ' ws ‘ - ”•> - ™s - -) I < . + si ± + .-, + i .<. 


This method is clearly of general application in such problems. 

We turn now to the proof of (6). The expression for G n ,kix ) shows that we 
may consider it the result of the following process: the integer j is a chance 
quanta with the distribution 


Pr(j - jo) 


(njf 

(2a)! 


For fixed j, G n ,k is the average over j of 




The first of these relations shows that j/n converges stochastically to J as n 
approaches infinity. The second shows, since 

/n - i - 1\ 

\n - j - 1/ _ (n - i - l)!(?i - j)\j\ = (n - j ) (j)(j - 1) - • • (j - i -f 1) 

(ri — j — 1 )\(j — i)!a! n(n — l)(n — 2) • - • (n — t) 

j — 1 / _ (n — i — l)!(n — j)\j\ 

(n — j — i)\(j - l)!n! 

= (n - j)(n - j - 1) - • • (n - j - i + 1 )j 
n(n — 1) • • • (n — t) 

and both of these converge stochastically to 2~ (l+1) as n approaches infinity, 
that G n<k j(x) converges stochastically to Gk(x). Since these curtailed generat¬ 
ing functions involve only powers of x in the finite range between —4 k and +4&, 
the limiting relation (6) follows at once. 




11. Effectiveness of normal approximation. Fig. 3 shows the relation be¬ 
tween the asymptotic distribution of the quadrant sum for large n and a normal 



CORNER TEST FOR ASSOCIATION 


513 


distribution with variance 24, i.e., the same variance as that of the asymptotic 
distribution. The normal approximation is calculated from 

PrflSnl 

where x is normally distributed with zero mean and unit variance. The asymp¬ 
totic and normal curves agree surprisingly well out to the 5% point, and an error 
of a full unit in the significance level first occurs beyond the 0.5% point. 

Since the asymptotic distributions for the quadrant, octant, hexadecant, do- 
triacontant,—, sums become more and more normal, the normal approximation 
will be even better for higher dimensions. In r dimensions, this approximation 
consists in treating 

IALzJ 

Vl2r 

as the absolute value of a standard deviate. This should be quite adequate for 
large samples and r > 4. 

12. Unsolved problems. The central unsolved problem in connection with 
the quadrant sum is: 

(1) What is the operating characteristic? 

This has as a corollary the more general question: 

(2) How can the operating characteristic of a nonparametric test be de¬ 
scribed so as to be useful to the users of the test? 

There are, of course, minor problems which are much more easily soluble. A 
few, listed in order of practical importance, are: 

(3) What is the effect on the significance levels of the use of lagged values 
of a; as values of y ? 

(4) What are the exact distributions for moderate n in three or more dimen¬ 
sions? 

(5) Do the analogous limiting distributions hold for three or more dimen¬ 
sions? 

(6) What is a better approximation to the limiting distribution for moderate 
n? 

To encourage others to solve some of these, we close with the assurance that 
they have our good wishes. 


REFERENCES 

11] W. A. Shewhart, “Contribution of statistics to the science of engineering,” University 

of Pennsylvania Bicentennial Conference , Volume on Fluid Mechanics and Sta¬ 
tistical Methods in Engineering, University of Penn. Press, 1941, pp. 97-124. 
Also Monograph B-1319, Bell Tel. System Tech. Pub., 1941. 

12] L. H. C. Tippett, Statistical Methods in Industry , pamphlet published by Iron and 

Steel Research Council of the British Iron and Steel Federation, 1943. 

(3] A. Wald, “An extension of Wilks' method for setting tolerance limits,” Annals of Math . 
Stat., Vol. 14 (1943), pp. 45-55. 



DISCRIMINANT FUNCTIONS 

By George W. Brown 
Iowa State College 

1. Introduction: In the following sections the development of discriminant 
function techniques is approached from an elementary point of view, considering 
first an essentially trivial problem, then working up to the more complex situa¬ 
tions which may be handled by discriminant function methods. No attempt 
has been made to follow the pattern of the historical development in this process, 
and no consistent attempt has been made to allocate proper credit, in tlfe text, 
to those individuals responsible for the introduction and exploitation of these 
methods. A more or less exhaustive bibliography of discriminant function 
applications and related theory is given at the end of this paper. 

Some historical perspective may be gained, however, from a very sketchy 
consideration of the early background of the subject. The first published 
application of the discriminant function seems to have been the work of Barnard 
(1935 [1]) on craniometry, following the suggestion of R. A. Fisher. Meanwhile 
P. C. Mahalanobis (1927, [30]; 1930, [31]) and, in this country, Hotelling (1931, 
[25]) had been concerned with a closely related problem, the construction of 
measures of the “distance” between two sets of multiple measurements, for which 
Karl Pearson’s (1926, [34]) coefficient of racial likeness was not wholly adequate. 
Fisher (1936, [18]) gave a further example of the method and showed (1938, [19]) 
the relation between his work and that of Hotelling (1931, [25]; 1936, [27]). Thus 
the theory of discriminant function analysis proper is about ten years old, but is 
intimately related to researches which go back a few more years. 

A simple problem: Consider the very simple case of a single measurement, say 
£, which may be made in each of two populations, and let us suppose, for the 
sake of discussion, that £ is normally distributed, with unit variance, in each 
population, but with possibly different means in the two populations. 

Let 


Ei(& = <* - p 
# 2 (£) = a + P 

be the mean values of £ over the two populations, with p > 0. As an example, 
we may consider the pH measurements of Iowa soil samples (Cox and Martin, 
[12]), for two soil populations, distinguished by the presence or absence of Azoto- 
bacter. From 100 samples, containing Azotobacter and 186 samples containing 
no Azotobacter, we have the estimated averages of pH equal to 7.423 and 6.015 
respectively, with an estimated standard error of .625 within populations (see 
Kg. 1). 

514 ' 



DISCRIMINANT FUNCTIONS 


515 


a = 6.719 
j§- .704 
.625 
1.13. 

Let us suppose further that £ is the only measurement available on a single 
individual, not knowing to which of populations 1 and 2 the individual belongs. 


Distribution of pH Measurements 



60 ! 5 6713 7423 

Fig. 1 


The problem is to classify this individual as a member of population 1 or popula¬ 
tion 2. It is clear that £ furnishes the only information on which to base a 
decision, and that essentially the only procedure available is to choose a number, 
say £o , such that we choose population 1 when £ < £o and population 2 when 
£ > £o. Furthermore, it is evident that the expected accuracy of classification 
depends on the size of p. If we wish to have equal risks of misclassification for 
members of the two populations we choose £ 0 = a. Then the probability of 
misclassification is given by P{t > P} } where € is a normal deviate with unit 
variance. As one would expect, the probability of misclassification tends to 0 as 
0 —> oo and tends to i as p —> 0. In the Azotobacter example, if we assume 
that the estimates given are the population values, we choose £o = 6.719. The 


516 


GEORGE W. BROWN 


ratio @/cr — 1.13 is exceeded approximately 13% of the time in sampling from 
the normal distribution, leading to .13 as the probability of misclassification. 

Consider now the slightly more general situation in which we consider a fixed 
variate, say w with measurements £ distributed, for fixed w , with a mean of the 
form a + 0w. This is the standard regression situation. As before assume 
that £ is normally distributed about this mean with unit variance, that is 

£ = a + fiw + € 

where a and /8 are constants, w may take on any or all real values, and 6 is a 
normal deviate. Note that if w is restricted to take on only two values the 
structure reduces to the first structure considered. An example of the continu¬ 
ous type might be constructed by considering w as genotypic yield of grain and £ 
a phenotypic measure of yield (Smith, [36]). 

The simple problem formulated for the two-population case may be reformu¬ 
lated here as follows: Given the relationship £ = a + + «, and given £ for 

an individual for which no other information is known, how shall we estimate w? 
For selective breeding the problem may be to select individuals for which w is 
at one end of the scale, rather than to estimate w itself. Whatever decision is 
to be made, it is still clear that £ furnishes the only available information, and 
that the certainty of the decision is a function of /3. Since (£ — a)/ (3 = w + c//3, 
the variance of this estimate of w is 1//3 2 . Note that confidence intervals for w y 
given £, may be constructed from the normally distributed quantity £ — a — $w. 

It should be pointed out that in the usual regression case we are interested in 
predicting £ for given w, with the hypothesis as stated above, whereas in this 
case £ will be observed, and the problem is that of estimating, as a parameter of 
the distribution of £, the fixed variate w. 

Obviously 0 must not .vanish if £ is to perform any discrimination among w 
values. In practice, of course, a and 0 will not be given as known values and the 
variance of € will not be known, but a finite set of observations may be available, 
for which w values are known and £ has been observed. The usual analysis of 
variance provides a significance test for the non-vanishing of 0, which is equiv¬ 
alent to testing for the significance of the regression of £ on w. 

It is to be noted that this analysis reduces to the conventional between-within 
analysis (F or J-test) when we have the special case of two populations. More¬ 
over, if we had treated £ as the fixed variate instead of w , and considered the re¬ 
gression of w on £, the Analysis of Variance would have differed only in replacing 
2(£ — f) 2 throughout by 2(w — tZ>) 2 and the relevant F-test would have been un¬ 
changed. 

When probabilities of misclassification are estimated from finite samples, as 
in the soil classification example, there are three sources of error, sampling error 
in the estimate of the separation value £ 0 , sampling error in the estimate of the 
distance between the population means, and sampling error in the estimated 
standard deviation of £ within populations. It does not appear difficult to set 
up confidence intervals for the probability of misclassification, assuming repeated 
classification of individuals given fixed initial samples. 



DISCRIMINANT FUNCTIONS 


517 


2. The one-dimensional discriminant function. We have been dealing so far 
with the simple situation in which only one measurement per individual is 
available for purposes of discrin ination. Suppose we still have this measure¬ 
ment, call it & , now, but we have other measurements as well, say f 2 , • • • , £ P . 
As before £1 = + fiw + *i. For the moment suppose that the remaining 

measurements have mean values independent of w, so that 

£m = 0i m + e m , (m = 2, • • • , p), 

and let us assume also that the {e m ) are mutually independent, (m = 1,2, • • • , p) 
and are normal deviates with unit variance. It is safe to assume that nobody 
would ever argue, in this case, that the measurements { 2 , • • • , £ P , provide 
information about the w value for an individual. If, then, we were so fortunate 
that we were in this situation, and knew so, we could say that & is our dis¬ 
criminant function, since, if any discriminating is to be done, £i has to do it. 


TABLE 1 


Analysis of Variance for Regression 



d.f. 

Sums of Squares 

Regression 

1 

r 2 2(£ - ©* 

Error 

N — 2 

(1 - r 2 )2(£ - |)» 

Total 

N - 1 

2(£ - {)» 


2(6 ~ M w ~ g) 


Vs(£ - £) 2 2(u> - ©) 2 


Suppose, now that the measurements 6i, £ 2 , • • • , £ p are not explicitly avail¬ 
able, but that we are able to observe a linearly equivalent set x x , x% , • • * , x p , 
related to the {£ m j by the transformation 



n«l 


where the l mn are unknown. For fixed w, x m has expected value 


2 ^/ lmn CL n + LlPW 


= a m + K w, 


so that in general each x m observation provides information about w. More¬ 
over, the x m are not in general mutually independent^it is evident that the 
population matrix of variances and covariances for fixed w is given by a mn = 

£ Imklnh . 
jfc-1 

As an example of a set of correlated measurements, consider the Azotobacter 
example referred to above. In addition to pH values, determinations of avail- 


518 


GEORGE W. BROWN 


able phosphate content and total nitrogen content were made on soil samples 
in each of the two populations. Means were as follows: 

pH Phosphate Nitrogen 

Mean of 100 samples with Azotobacter 7.423 133.120 29.400 

Mean of 186 samples without ” 6.015 51.113 21.140 

Mean difference 1.408 82.007 8.260 

Clearly the differences are proportional to the hypothetical b m ’ s. The variance- 
covariance matrix, estimated from the 284 degrees of freedom within populations, 
is given by Table 2. 


TABLE 2 



pH 

Phosphate 

Nitrogen 

pH 

111.0879 

2,292.7192 

198.4026 

284(ow) = Phosphate 


1,042,799.1890 

5,066.2645 

Nitrogen 



29,422.3655 


Estimated correlation coefficients within populations are not large, .213 for pH 
and Phosphate, .110 for pH and Nitrogen, and .029 for Phosphate and Nitrogen. 

Another example is furnished by Fisher’s Iris measurements [8], provid¬ 
ing sepal length, sepal width, petal length, and petal width for each of 50 
individuals of Iris setosa and 50 individuals of Iris versicolor. This example is 
an unfortunate one in that either petal length or petal width alone is sufficient 
to discriminate the two populations as completely as anybody has a right to 
expect anytime. The petal lengths, for example, vary between 1.0 and 1.9 cm. 
for the 50 setosa, and between 3.0 and 5.1 cm. for the 50 versicolor. 

Let us proceed, under the assumption that available measurements, x m , 
are distributed normally about mean values a m + b m w, with variance covari¬ 
ance matrix <r mn for fixed w, keeping in mind the underlying model of £i, f 2 , • • • , 
£ p , with 

Xm = i Imn £n , £l = + fiW + €\ J £2 = <*2 + «2 J * * * J £p = Otp + €p . 

n—1 

The skeptic may wish to grant the first part of our assumptions without grant¬ 
ing the hypothetical structure of £’s underlying the x’s. Hotelling’s work [27] 
shows that such an underlying structure of £’s may always be provided, given 
the distribution of x’s for fixed w. In other words, a distribution of x’s for fixed 
w leads essentially uniquely to an underlying £ model. 

The discriminant function, given a mn , a m and b m , for m, n, = 1, 2, • ■ • , p, 
is 





DISCRIMINANT FUNCTIONS 


519 


= £ <r m "b m x n = Ini 


rn,nmml 


where 


= t, a m 'b m . 


and 


is the reciprocal matrix to <r mn . That is a mn are the solutions of the linear sys¬ 
tems [17J 

<r m ‘<r,n = 0 if m = n; m, n, = 1, 2, • • *, p 


2^ = 1; m = 1, • • •, p. 

«—i 

That X, as defined above, is properly called the discriminant function will be¬ 
come evident immediately. Putting b n — x n = ^ , we have 


X = p £ c mn UUh. 

* 

Recalling that the a mn are reciprocal to <r mn = ]£ Ldnt , it can be seen that 

As 

2 <r mn lmilnk = 1 if k = 1, and vanishes for k j= 1. It follows that 

mn 

X = @ h , 

in other words, X calculated as 22 <r mn b m x n from known population quantities 

mn 

is proportional to the hypothetical fi, the only one of the underlying measure¬ 
ments which is related to w> thus justifying the term discriminant function for 
X. It is clear that any other linear function of the x’s is also a linear function of 
the £’s, and can discriminate, at best, only as well as X itself, since all the £'s 
are independent of w , with the exception of fi. X itself discriminates w to the 
same extent that £i , were it available, would discriminate. 

The degree of discrimination of w’s depends, as indicated in the previous sec¬ 
tion, on the ratio of the mean square of £i, among w’s (mean square for regres¬ 
sion), to the mean square of £i for fixed w (mean square for error). Since X 
is proportional to & , the same is true when X is substituted for (i . It turns 
out, of course, that X is that linear combination of x’s for which the ratio of the 
mean square for regression to the mean square for error is a maximum, or, what 
is the same thing, X is that linear combination of X’s which has the maximum 
correlation with w. From any point of view X appears to be the logical function 
of x’s to compute. It is clear that \X is precisely as good as X, if X is any con¬ 
stant. 



520 


GEQRGE W. BROWN 


In the two population case, where w takes on only two values, X is evidently 
proportional to 2<r mn (Mmi — Hm 2 )x n , where y. m \ and /z m2 are the mean values of 
x m in the two populations. X is here the particular linear combination of x’a 
for which the ratio of the mean square between populations to the mean square 
within populations ura maximum. The value of this ratio, which measures the 
degree of discrimination possible, depends on the spread of the means of X 
between the populations, or in general, on the spread of the means of X over some 
given distribution of w’s. Given <r mn and b m the larger the spread of w values 
the better overall discrimination will be obtainable. On the other hand, the 
coefficients for X depend only on <r mn and b m . 

Since X is proportional to £ 1 , it follows that the discriminant function is in¬ 
variant under non-singular linear transformation of the x’s, that is, if some set of 
y* s, linearly dependent on the z’s, had been observed, together with their means, 
variances and covariances, the discriminant values would not have changed. 
This invariance is obviously a desirable property, and as such was one of the 
goals of Fisher, Hotelling, and Mahalanobis. One more property of the dis¬ 
criminant function is of interest; X is essentially equivalent to the maximum 
likelihood estimate of w. 

In our statistical model w plays the role of a fixed variate or population param¬ 
eter, and the x’s have a joint distribution about linear functions of w as means. 
Suppose now that (<r mn ) and {b m } are estimated from an analysis of variance 
and covariance on data for which w as well as x values are known. The problem 
of estimating w for a single individual whose x measurements are given resolves 
into a two-stage estimation process, the first stage being the estimation of 
((Tmn) and {b m } from the initial data, the second stage being the estimation of w 
by the discriminant function whose coefficients are computed from the es¬ 
timated (<Tmn) and { b m }. It has already been pointed out that X is the linear 
combination of x’s which has greatest correlation with w. It turns out, then, 
that the coefficients of X are proportional to those which would have been ob¬ 
tained from a formal regression analysis of w on xi , x 2 , • • • , x p , considering the 
x’s as independent variables and w as dependent variable, a direct interchange of 
roles as compared with the statistical model we have assumed. Of course two 
linear functions differing only by a factor of proportionality are equivalent in 
discrimination. If the formal analysis of variance is carried out for testing the 
significance of the regression of w on x\ , x 2 , • • • , x p , the relevant F ratio re¬ 
mains a valid test for the non-vanishing of the b m in spite of the inversion of 
dependent and independent variables. The analysis of variance is given in 
Table 3. 

R is, of course, the conventional multiple correlation coefficient. An equiva¬ 
lent analysis can be carried out for X itself, allowing sufficient degrees of freedom 
for the estimation of the constants in X, as given in Table 4. 

This analysis is proportional to the analysis given above. It might be noted 
that the mean square corresponding to error sum of squares in this analysis is 
S(r mw 6 m 6 n , which is X evaluated for x n = 6 n , (n = 1,2, • • • ,p). 



DISCRIMINANT FUNCTIONS 


521 


In the Azotobacter example, Cox and Martin arrive at a discriminant function 
which has the analysis given in Table 5. 

It is evident that the difference between populations is highly significant. The 
choice of scale for X in this case forces the sum of squares within populations to 
be equal to the difference between the mean X values for the two populations. 
Thus the mean X differs by .021777 for the two populations, and has an esti- 

TABLE 3 


Analysis of Variance for Regression 



d.f. 

Sums of Squares 

Regression 

V 

R 2 2(w - ©) J 

Error 

N — p — 1 

(1 — R 2 )?Z(w — w) 2 

Total 

N - 1 

2 (w — w) 1 


TABLE 4 


Analysis of Variance for X on w 



d.f. 

Sums of Squares 

Regression 

V 

m 2(x - xy 


Error 

N - p - 1 

(i - fl 2 )2(x - xy 


Total 

N - 1 

2(x - xy 


TABLE 5 


Analysis of Variance of Discriminant Function 



d.f. 

Sums of Squares 

Mean Square 

Between populations 

3 

.030842 

.01028 

Within populations 

282 

.021777 

.00007722 

Total 

285 




mated standard error, within populations, equal to V.00007722 = .008788. 
Half the difference, divided by the standard error is the normal deviate cor¬ 
responding to misclassification, if equal risks are taken. In this case the value 
of the normal deviate is 1.24, approximately, leading to an estimated probability 
of misclassification of about .11, which is not very much better than the .13 
which one would have obtained if pH alone had been used. 

In this problem, as in conventional regression analysis, it is .tempting, for 




522 


GEORGE W. BROWN 


various reasons, to consider the possibility of using smaller sets of classifying 
measurements. Moreover, a significance test for this situation is in general 
more interesting, as a practical matter, than the significance test for differences 
among populations, since the initial presumption is that we are interested in 
being able to discriminate, on the basis of X \, & , • • • , z p . Suppose, for ex¬ 
ample, we wish to test whether the discriminant function X based on X \, 
X 2 , • • • , x p is significantly better than the discriminant function X (r) based on 
X \, • • • , x r , with r < p. The relevant test is precisely the same as the test 

TABLE 6 


Analysis of Variance for Rejecting x r +i , • • • , x p 




Sums of Squares 

d.f. 

sl 

Regression on 

Xi , ■ • • , X r 

r 

S% 

Regression on 

Xi , • • • , x r , x r +\, • • • , Xp 

V 

s* p -s^ 

Difference 


p — r 

/Sy — /Sp 

Error 


N - p - 1 

si 

Total 


N - 1 


TABLE 7 

Analysis of Variance for X = X Q 



Sums of Squares 

d.f. 

4 

Regression on X 0 

1 

4 2 

Regression on X \, • • • , x p 

V 

si - si 

Difference 

p -1 

si - si 

Error 

N - p - 1 

si 

Total 

N - 1 


calculated formally from the regression of w on the sets Xi , • • • , x r and Xi , 
x% , • • • , x p , with the analysis of variance given in Table 6. 

Similarly, if we wish to test for the significance of a theoretical discriminant 
function, X Q , with preassigned coefficients, as compared with X p , we have 
again the conventional test calculated from the formal analysis of the regression 
of w on Xi , x 2 , • • • , x p , as given in Table 7. 

As shown by Fisher [21] the relevant F-Test for this hypothesis is computable 
as 

tp __ n — p + 1 R' 2 

l P—l,n—p+l ^T" IZTR2 






DISCRIMINANT FUNCTIONS 


523 


where R'* = R\ 1 - /), r is the correlation between X and X Q for fixed 10 , and 
R is the multiple correlation for w on Xi , • • • , x p , or, what is the same thing, the 
correlation of w and X . 

The example of Smith [36] is an example in which the relationships of a^s 
to w have to be estimated from analysis of variance and covariance of data in 
which the w'a are not really known, being related to genotypes. The regression 
of on w is estimated by a generalization of the components-of-variance 
method, from variance-covariance analyses in which the usual null hypotheses 
are significantly contradicted. The net effect is that the usual significance 
tests now fail to hold, although the algebraic calculations are formally equivalent 
to those given above, once the population relations of x’s to w are established. 
When work of this kind is based on small samples, there is some difficulty in 
estimating the reliability of the results. 

3. Multi-dimensional discriminant functions. Instead of trying to discrimi¬ 
nate between two populations or estimate a single parameter w , our problem may 
be to discriminate among several populations, not necessarily linearly related, 
or to estimate many independent parameters Wi , w 2 , • • • , w t . Just as a single 
parameter w is sufficient to distinguish between means of measurements for two 
different populations, s parameters are sufficient to distinguish between means 
of s + 1 different populations, and exactly s parameters will be required, if 
no linear relation obtains among the s + 1 populations. For example, with 
three populations, any measurement mean may be given the three possible 
values a y a + £, a + 7 , corresponding to w x = w 2 = 0 for population 1 , w% = 1 , 
W2 = 0 for population 2 , and wi = 0 , w 2 = 1 for population 3. Geometrically 
we have to consider a set of parameter values as a point in an s-dimensional 
space. 

The one-dimensional discriminant function admits two very different general¬ 
izations in higher dimensions. The practical solution to a particular problem 
for which s is moderately large may involve a mixture of both generalizations. 

Let us generalize our statistical model before discussing the discrimination 
problem. To avoid complication of algebraic notation, let us for the moment 
assume 8 = 2. We will now postulate a set of hypothetical measurements 
( 1 , &,•••, ( P , with 


{1 — «i + ftw + 

(2 = + P2U + W + 

fa = 


fp ““ a P + € p > 



524 


GEORGE W. BROWN 


where the c p are independent normal deviates with unit variance, u and v are 
fixed variates or parameters corresponding to the different populations, and 
ai, aj, • • • , a, , ft , ft , 7i» and y 2 are constants. Evidently £ 3 , • • • , £ P can 
yield no information about u and t/j £1 and £ 2 together contain all the information 
there is to get about u and v. As before, assume that our data will be in the form 
of linear combinations x m = 2l mn £ n , with unknown coefficients l mn . The 
variance-covariance matrix within populations, or for fixed w, v, is still given by 
<r mn = Xlmklni . The mean values of the z’s for fixed u , are given by 

E(x m ) = 'Ll mn a n + (ImiPl + lmA)u + (Imiyi + Jm272)t> 

= Am + b m u + c m v. 

This model is again justifiable on the basis of Hotelling’s work. 

The first question to ask is whether we can now form two linear combinations 
of the s’s and get rid of £ 3 , • • • , £ P in both, thus providing a two dimensional 
description of an individual on the basis of x x , x 2 , • • • , x p . The answer here 
is in the affirmative, as a result of a direct generalization of the method dis¬ 
cussed earlier. If we calculate X x = 2<T mn b m x n and X 2 = 2a mn c m x n , we are 
fortunate enough to get 

Xi = ftfi + && 

X 1 = 7 i£i + 72S2 

with no disturbing elements from £ 3 , • • • , £ P . Assuming for now"that Xi and 
X 2 are not merely proportional, i.e. j3 x y 2 — /3 2 y x ^ 0 , what do we do with X x and 
X 2 ? 

For fixed u, v, we have 

E(X 1 ) = 2<r mn b m a n + u2a mtl b m b n + v2a mn b m c n 
= Ai + B x u + C x v 

E{X %) = 2 <r mn C m o» + u2<r mn c m b n + v2a mn c m c n 
= A% + B 2 u -f- C 2 v 
and variances and covariance 

rn = 2<r mn b m b n = Bi 
Tvx = 2a mn b m c n = C\ = B% 
ra = 2<r mn c m c n = C %. 

We may for example, estimate u and v by solving the equations 

B\U + civ = X x — A x 


B 2 u + C 2 v = X 2 — Aj, 



DISCRIMINANT FUNCTIONS 


525 


or we may set up regions in the Xi , X 2 plane for which certain decisions are 
made. For example, when classifying an individual into one of three popula¬ 
tions, we might delineate regions, as in Fig. 2. 

Then the particular individual would be classified as coming from population I, 
II, or III, according to which region Xi, X 2 falls in. The individual points 
shown in the figure represent the expected values of Xi, X 2 for each of the three 
populations. No exhaustive investigation has been made for this situation, but 
some fairly obvious methods are available for constructing such regions. 

With respect to significance tests when the <r mn , a m , b m , c m are estimated from 
samples, the whole gamut of multivariate analysis has to be run. Tests ana¬ 
logous to (but more complicated than) F tests exist for testing the significance 

Classification Peqions in Plane 



of the discrimination, the significance of a subset of the x% and the significance 
of a theoretical pair X lt0 , X 2t0 (Wilks [41], [42], [43]). 

For some purposes a two-dimensional discrimant function Xi, X 2 may be 
unsatisfactory. For example, we might suspect that fay 2 = fayi (or that the 
relationship is nearly satisfied). Under these circumstances X\ is (nearly) 
proportional to X 2 , and we would like to compute the best one-dimensional 
discriminant function, even though we have started with two linear parameters 
u and v. Even if fay 2 ^ fayi we might still ask for the best one-dimensional dis¬ 
criminant function, in order to rank our populations on the “best” linear scale. 
If we define Y as that linear combination of Xi , x 2 , • • • , x p which has the largest 
multiple correlation with u and v, we have generalized the simple one-dimen¬ 
sional discriminant function in a second direction. 

Before proceeding, it is useful to recognize that F, as defined above, must be a 



526 


GEORGE W. BROWN 


function of X\ , X 2 , since X\ and X 2 together contain all the information about 
u and v that can be obtained from the x f a. 

Now suppose we consider an arbitrary linear combination Y — \\Xi + XjXj . 
Y correlates best with 


Xi(rntt + nzv) + \ 2 {t\ 2 U + Twv) = (XiTn + X 2 ri 2 )w + (Xa ru + Xjtm)#. 

We now have to choose Xi and X 2 to maximize this correlation. This correla¬ 
tion will be maximized if we maximize the ratio of the variance of 


(XlTn + X 2 T 12 ) + (XlT]2 + X 2 T22)t> 

(over the distribution of u and v values) to the variance of Y for fixed u and v. 
Call the first quantity S x , the second S 2 . Then S 2 = Xi 2 m + 2 XiX 2 ti* + X 2 2 Ta 2 
and Si is of the form Xi 2 /m + 2 XiX 2 /xi 2 + where 

M 11 — Tnffuv + 2rnri2<r„* + t l 2 a vv 

Ml2 = TiiTn(T uu + (na 2 + TnT 2 2 )cr u „ + T12T220'** 

M 22 = Tn<r uu + 2ri2T22(r uv + r<n<j v , . 


Maximizing S 1 /S 2 leads to the equations: 


Xl Til + X 2 T 12 — (Xl Mu + X 2 M 12 ) 
o 2 


i.e. 


S 

Xl 7*12 + X) T 22 = (Xl Ml 2 + X 2 M 22 ) 
02 


Xi(rn — 0Mi|) + X 2 (ri2 — Gfin) = 0 


Xl(ri2 — 0/Xi 2 ) + X 2 (t22 — 0M22) *7 0, with 6 = S 1 /S 2 . 

It is thus seen that 6 must satisfy the quadratic equation 
(rn — 0 Mu)(t22 ~ 0M22) — (T12 — 0mm) 2 = 0, 


in order for solutions Xi, X 2 to exist. In general there will be two solutions, of 
which the greater corresponds to that linear combination X 1 X 1 + \ 2 X 2 which has 
greatest multiple correlation with u and v, whereas the smaller corresponds to 
that linear combination which has least multiple correlation with u and v. 
6 itself corresponds to i? 2 /(l — R 2 ) for the regression of X 1 X 1 + X 2 X 2 on u, v. 

In the general case with s degrees of freedom corresponding to Wi , w 2 , • • • , w 9 , 
there is an s-dimensional discriminant function (Xi , X 2 , • • • , X,), and a set of 
8 linear combinations for which R 2 /{ 1 — R 2 ) is stationary with respect to 

Xi > • • • > X,. 

The 8 roots (corresponding to an equation of degree s) arranged in decreasing 
order, permit construction of the best one-dimensional, two-dimensional, ••• , 
(8 — 1)-dimensional discriminant functions. 



DISCRIMINANT FUNCTIONS 


527 


Discussion of the relevant significance tests for these reduced discriminant 
functions is beyond the scope of this paper. Reference may be made to the 
work of Hotelling and Fisher. 


REFERENCES 


[1] M. M. Barnard, “The secular variations of skull characters in four series of Egyptian 
skulls”, Ann. Eugen ., Vol. 6 (1935), pp. 352-371. 

[2J M. S. Bartlett, “The standard errors of discriminant function coefficients’*, Jour. 
Roy. Stat. Soc., Suppl. 6 (1939), pp 169-173. 

[3] M. S. Bartlett, “Statistical significance of cannonical correlations”, Biometrika , 

Vol. 32 (1942), pp. 29-37. 

[4] W. D. Baten, “The discriminant function applied to spore measurements”, Mich. 

Acad, of Sci., Arts , and Letters, Vol. 29 (1943), pp. 3-7. 

[5] W. D. Baten and C. C. DeWitt, “Use of the discriminant function in the comparison 

of proximate coal analyses”, Indust, and Eng. Chem., Anal. Ed., Vol. 16 (1944), 
pp. 32-34. 

^[6] W. D. Baten and II. M. Hatcher, “Distinguishing method differences by use of dis- 
criminant functions”, Jour, of Exp. Ed , March, 1944. 

\yM\ W. D. Baten, “The use of discriminant functions in comparing judges’ scores con¬ 
cerning potatoes”, Jour. Amer. Stat. Assoc., Vol. 40 (1945), pp. 223-227. 

[8] R. C. Bose, “On the exact distribution of D 2 statistic”, Sankhya, Vol. 2 (1936), pp 
143-154. 


[9] R. C. Bose and S. N. Roy, “The exact distribution of the Studentized D J statistic”, 
Sankhya, Vol. 4 (1938) pp. 19-38. 

[10] G. W. Brier, R. G. Schoot, and V. L. Simmons, “The discriminant function applied 

to quality rating in sheep”, Proc. Amer. Soc. An. Prod., Vol. 1 (1940), pp. 153-160. 

[11] W. G. Cochran, “The comparison of different scales of measurement for experimental 

results”, Annals of Math. Stat., Vol. 14 (1913), pp. 205-216. 

[12] G. M. Cox and W. P. Martin, “Use of a discriminant function for differentiating soils 

with different Azotobacter populations”, Iowa Slate Col . Jour, of Sci., Vol. 11 
(1937), pp. 323-331. 

JI3] B. B. Day and M. M. Sandomire, “Use of the discriminant function for more than 
two groups”, Jour. Amer. Stat. Assoc., Vol. 37 (1942), pp. 461-472. 

[14] W. E. Deming, “On the chi test and curve fitting”, Jour. Amer. Stat. Assoc., Vol. 29 

(1934), pp. 372-382. 

[15] D. Durand, “Risk elements in consumer installment financing”, Nat. Bur. Econ. 

Res., Inc., Studies in Consumer Installment Financing, No. 8, 1941. 

[16] R. A. Fisher, “The general sampling distribution of the multiple correlation coeffi¬ 

cient”, Proc. Roy. Soc. A., Vol. 121 (1928), pp. 651-673. 

[17] R. A. Fisher, “ Statistical Methods for Research Workers”, Oliver and Boyd, Section 29. 
t[l8] R. A. Fisher, “The use of multiple measurements in taxonomic problems”, Ann . 

Eugen., Vol. 7 (1936), pp. 179-188. 

>/fl9] R. A. Fisher, “Statistical utilization of multiple measurements”, Ann. Eugen., Vol. 
8 (1938), pp. 376-3S6. 

[20] R. A. Fisher, “The sampling distribution of some statistics obtained from non-linear 
equations”, Ann. Eugen., Vol. 9 (1939), pp. 238-249. 

V/[21] R. A. Fisher, “The precision of discriminant functions”, Ann. Eugen., Vol. 10 (1940), 
pp. 422-429. 

[22] H. E. Garrett, “The discriminant function and its use in psychology”, Psychometrika , 

Vol. 8 (1943), pp. 65-79. 

[23] M. A. Girshick, “On the sampling theory of the roots of determinantai equations”. 

Annals of Math. Stat., Vol. 10 (1939), pp. 203-224. 



528 


GEORGE W. BROWN 


(24] T. Haalvelmo, “Probability applications in econometrics”, Econometrica, Suppl. 
Vol. 12 (1944). 

[26] H. Hotelling, “Generalization of Student’s ratio”, Annals of Math. Stat. ,VoI. 2 
(1931), pp. 360-378. 

[26] H. Hotelling, “The most predictable onterion”, Jour. Educ. Psychol. , Vol. 26 (1935), 
pp. 139-142. 

|27] H. Hotelling, “Relations between two sets of variates”, Biometrika , Vol. 28 (1936), 
pp. 321-377. 

(28] P. L. Hsu, “On the distribution of roots of certain determinantal equations”, Ann . 

Eugen., Vol. 9 (1939), pp. 250-258. 

(29] W. G. Madow, “Contributions to the theory of multivariate statistical analyses”, 

Trans. Am. Math. Soc ., Vol. 44 (1938), pp. 454-^495. 

(30] P. C. Maiialanobis, “Analysis of race mixture in Bengal”, Jour. Asiat. Soc. Beng., 

Vol. 23 (1927), pp. 301-333. 

(31] P. C. Mahalanobis, “On tests and measures of group divergence”, Jour. Asiat. Soc. 

Beng., Vol. 26 (1930), pp. 541-588. 

(32] P. C. Mahalanobis, “On the generalized distance in statistics”, Proc. Mat. Inst. Set. 

Ind., Vol. 12 (1936), pp. 49-55. 

(33] E. A. Martin, “A study of an Egyptian series of mandibles, with special reference to 

mathematical methods of sexing”, Biometrika , Vol. 28 (1936), pp. 149-178. 
(341 K. Pearson, “On the coefficient of racial likeness”, Biometrika , Vol. 18 (1926) ,pp. 
105-117. 

[35] C. R. Rao, “Tests with discriminant functions in multivariate analysis”, Sankhya, 

Vol. 7 (1946), pp. 407-414. 

[36] H. F. Smith, “A discriminant function for plant selection”, Ann. Eugen., Vol. 7 (1936), 

pp. 240-250. 

[37] W. L. Stevens, “The standardization of rubber flexing tests”, India Rubber World, 

August 1, 1940. 

[381 A. Wald, “On a statisical problem arising in the classification of an individual into 
one of two groups”, Annals of Math. Stat., Vol. 14 (1944), pp. 145-162. 

[39] N. Wallace and R. M. W. Travers, “A psychometric sociologia study of a group of 

specialty salesmen”, Ann. Eugen. , Vol. 8 (1938), pp. 266-302. 

[40] F. V. Waugh, “Regression between sets of variables”, Econometrica, Vol. 10(1942), 

pp. 290-310. 

[41] S. S. Wilks, “Certain generalizations in the analysis of variance”, Biometrika , Vol. 

24 (1932), pp. 471-494. 

[42] S. S. Wilks, “On the independence of k sets of normally distributed statistical van- 

ablcs”, Econometrica, Vol. 3 (1935), pp. 309-326. 

[43] S. S. Wilks, Mathematical Statistics. Princeton Univ. Press, 1943. 



NON-PARAMETRIC ESTIMATION II. STATISTICALLY EQUIVALENT 

BLOCKS AND TOLERANCE REGIONS—THE CONTINUOUS CASE 

By John W. Tukey 
Princeton University 

1. Summary. Wald [2, 1943] extended the usefulness of tolerance limits to 
the simplest multi-dimensional eases. Ilis principle is here used to provide 
many new ways of using a sample of n to divide the range of the population into 
n + 1 blocks of known behavior. The exact tolerance distribution for the 
proportions of the population covered by these blocks is extended from the case 
of a continuous probability density function to the case of a continuous cumula¬ 
tive distribution function. Such an extension is needed in dealing completely 
with multivariate eases even where the underlying distribution is as smooth as a 
multivariate normal distribution. 

The devices used in Paper I [1] to extend the usefulness of tolerance limits to 
the case of a discontinuous underlying distribution will be applied in the next 
paper of this series, with some extension, to extend the usefulness of these gen¬ 
eral tolerance regions to the case of a discontinuous distribution. Some of these 
results specialize into new results for the univariate case, although they do not 
seem to have any immediate practical application. 

The author wishes to acknowledge the stimulation given to his work on this 
problem by Henry Scheflte, whose modesty has kept this paper from the joint 
authorship of papers I [1, Scheffd and Tukey 1945] and IV (not yet written). 

2. Introduction. WakPs great contribution to the theory of tolerance limits 
was his method of successive elimination. As originally presented for a bi¬ 
variate situation it ran roughly as follows: Let (xi , yO, ( 2*2 , 2 / 2 ), • • • , (x H , y n ) 
be a sample of n from an arbitrary bivariate population. The type of tolerance 
region to be used is determined by four pro assigned integers, fa , fa , fa , and 
fa . The procedure is as follows: Order the n observations according to their x 
values. Select the ki highest, and let the x coordinate of the lowest of these fa 
be x u . Select the fa lowest, and let the x coordinate of the highest of these 
fa be Xi . Discard these ki + fa selected observations, and order the remaining 
n — h — fa observations according to their y values. Select the fa highest of 
these remaining observations, and let the y coordinate of the lowest of these fa 
be y u . Select the fa lowest of these remaining observations, and let the y 
coordinate of the highest of these fa be y t . The tolerance region, consisting 
of all points (x, y), with x t < x < x u and y t < y < y u depends on the sample* 
and, hence, so does the fraction of the population falling in ( = covered by) this 
region. Wald showed that the distribution of this fraction covered was in¬ 
dependent of the underlying bivariate distribution, so long as this latter dis¬ 
tribution had a continuous probability density function. He showed that the 

529 



630 


JOHN W. TUKEY 


distribution was the same as that arising in the one-dimensional case when a 
tolerance region was set with the aid of fa + fa + fa + fa observations. (Nu¬ 
merical approximation to these distributions will be discussed in Paper IV of this 
series. 

The important device in this process, and the one which makes the conclusion 
possible, is the discarding of the fa + fa observations after they have played 
their part by determining x t and x u . 

We shall shortly be able to describe this procedure of Wald’s as a special case 
of a more general procedure, but we shall first go back to the simplest one dimen¬ 
sional case to explain some of our notions and terminology. 

Consider the uniform distribution from 0 to 1, draw a sample of n, and let the 
sample values, ordered according to size be ti , t %, • • • , tn . These n values di¬ 
vide the interval from 0 to 1 into the following n + 1 parts (0, h), (h , t 2 ), • • • , 
(tn-itn), (t n , l) which we shall call blocks. Since the joint distribution of the 
t x is well known, that of the lengths of these n — 1 blocks is easily found. This 
distribution of lengths would be unimportant, if it were not at the same time the 
distribution of the fractions of the population covered by the blocks. As is 
shown later, this distribution of fractions covered, or, more simply, of coverages , 
has the following properties: 

(i) the fractions covered add up to 1. 

(ii) the distribution is completely symmetrical. 

Property (ii) makes intuitive the result of Wilks [3, 1941] that the distributions 
of the coverage of regions obtained 

(a) by removing the fa + fa left-most, blocks, 

(b) by removing the fa left-most and the fa right-most blocks 
are identical. The specific distribution obtained satisfies 

(iii) if the coverages are taken as barycentric coordinates on an n-simplex, 
the distribution over the simplex is uniform, 

(iv) the sum of the coverages of any k preselected blocks of the n + 1 has 
the well-known distribution 

Pr {sum of k coverages < t] = It (n — k + 1, k) 

where Ip (n, m) is the incomplete Beta function. 

We shall call a set of blocks, derived from a sample, whose coverages behave in 
this general way a set of statistically equivalent blocks . Normally this will be 
abbreviated to se-blocks. (A precise definition is given in section 4.) 

We shall concentrate much of our attention on all the blocks and their sym¬ 
metrical character, rather than on the tolerance region formed by deleting k 
of them, since our results will then be applicable to many other problems. 

Now we can generalize* Wald’s original procedure. Let W \, W 2 , • • • , W n 
be a sample of n —we shall not need to consider its distribution—and let <pi , 
<p %, • • ■ , <p n be n numerically valued functions of W, possibly alike, possibly 
distinct, such that <pi(W), <&(W), • • • , <p n (W) have a joint distribution. Proceed 
as follows: 



NON-PARAMETRIC ESTIMATION 


531 


Order the W { according to the numbers select the W { for which 

is largest and denote it by TF, (1) . The first block contains all W such that 

(2-la) Vl (W) > pAW m ). 

Discarding W im , order the remaining IT, according to the values of 
and select as W m the one giving the largest value. The second block contains 
all W such that 


<pi(W) < w op«i)), 

<P2(W) > «(TT <od ). 


Continue this process. The mth block, for m < n will be defined by 

*PiW) < ), j = 1, 2, • • • , m - 1, 

(2.1m) 

<Pm(W) > 

and the (n + l)st block by 

(2.In) <pj(W) < ' j = 1, 2, • • • , n. 

(A graphical example of this construction is given shortly.) This set of n + 1 
blocks will be statistically equivalent whenever the cumulative distribution of 
each function is continuous. 

To specialize this to the case described above, let IT be a pair (x 9 y) of numbers 
and let 

(i) the first ki s be the ^-coordinate of W, 

(ii) the next fo ^’s be minus the ^-coordinate of W, 

(iii) the next fa<p ’s be the y -coordinate of W, 

(iv) the next fa <p'a be minus the y-coordinate of W, 

(v) the remaining <p’s be arbitrary. 

Then the first fa blocks will contain all W for which 


that is, for which 


x = (pj(W) > 


x > x % == ^>jt 1 (TF'»*(*i))* 


3 ~ 1 > 2, • • • , fa 


Similarly, the next fa + fa + fa blocks will contain all W with 


x <x t9 

V> Vu, Xi<X < Xu, 

V < Vi , Xi < x < Xu , 

respectively, and the removal of these fa + fa + fa + fa blocks leaves Wald’s 
tolerance region (plus the boundaries where x = x u , x = x t , y = y u , y = y t ). 
There would be no point in this more general wording, if it did not include 



532 


JOHN W. TUKEY 


new cases of some interest. We give now, in graphic terms, an example of such 
a case. 

We deal with a sample of n bivariate observations, which we think of as plotted 
on a map so that we can use geographical language. The number n is rather 
large, and we wish to construct a tolerance region by deleting 12 blocks. We 
proceed as follows: 

Find the most northerly point, draw an East-Wesl line through it, and shade 
the area North of the line. Find the most easterly point in the unshaded area, 
*r w a North-South line through it, and shade the unshaded area East of the 



Fig. 1 


ine. Find {he most southerly point, (always working in the unshaded area), 
draw an East-West line through it and shade the area South of the line. Find 
the most westerly point, draw a North-South line through it, and shade the area 
West of the line. Find the most northeasterly point, draw a NW-SE line through 
it and shade the area northeast of the line. Find the most southeasterly point, 
draw a NE-SW line through it, and shade the area southeast of the line. Repeat 
this 6 times more, choosing in succession the most southwesterly, northwesterly, 
northerly, easterly, southerly, and westerly points. The remaining points will 
now lie in an unshaded area surrounded by a polygon, which will have 8 (or 
perhaps fewer) sides. The inside of this polygon is the desired tolerance region. 



NON-PARAMETRIC ESTIMATION 


533 


Figure 1 shows the final result, starting from n = 25. The practicing statis¬ 
tician is invited to try an example of his own with n at least 100. 

Other newly accessible cases can easily be invented by the reader, after he con¬ 
siders this example carefully. 

The use of a single W and n functions <p»- has two virtues; it simplifies nota¬ 
tion and frees the intuition, as compared with the use of n chance quantities 

Zi - <pi(W). . 

If the bivariate situation above were regarded as a 12-variate situation, where 
the variates were, in order, (y, x, ~y, -x, x + y, x — y, — x — y, — x + y, 
y } x, — y, — x) then the original Wald procedure with ki = k t = • • • = fe® = 

1 ; k 2 = fa = • • ♦ = fe* = 0 would apply to construct the same region. Yet 
even if x and y had a bivariate normal distribution, Wald's proof would not 
apply without extension. For the 12-dimensional distribution is highly singular 
(it is concentrated on a 2-dimensional plane in 12-dimensional space) and there 
is no hope of a density function. An extension of Wald's result to the case 
where the 12-dimensional joint cumulative distribution function is continuous 
—as is the case in this example when x and y have a continuous joint cumulative 
—is clearly needed. 

When we come to deal with the case of where the cumulative needs not be 
continuous we shall meet a further difficulty, namely “ties". But if, as in the 
present case, the cumulative is continuous, it is easy to see that the probability 
that <p t (Wj) = <Pi(Wk) for any i, j, k is zero. 

3. Terminology and notation. A quantity which has a probability distribu¬ 
tion we call a chance quantity (it has frequently been called a random variable ). 
The term chance quantity does not imply that its values are single real numbers, 
they may be single real numbers (when we also speak of a real chance quantity), 
sets of n real numbers, or more general objects. The cumulative distribution 
function, or cumulative, of a single real chance quantity, X , is defined by 

F(t) = Pr{X < t], 

except perhaps at the discontinuities of F. We have used here the notation 
Pr{ifc(Y)} to indicate the probability that k(X) holds, and we have followed our 
policy of using capital letters for chance quantities and the corresponding 
lower case letters for their values. 

The set of values of W, or, as we shall say, the W-set, for which, for example 
v(W) < 3, will be denoted by 

{W | <p(W) < 3}. 

We shall wish to compute probabilities associated with one or more functions 
of a chance quantity; usually we will emphasize that these functions shall be 
measurable with respect to the probability measure underlying the distribution 
of W by asserting that they have a joint cumulative, which is defined by 

F(ti , k , • • • i tk) “ Pr{ipk{W) < t k }, 



534 


JOHN W. TUKEY 


(except possibly at discontinuities of F) and which does not exist unless the <p% 
are measurable with respect to the unknown underlying distribution of W. In 
cases where we neglect to remind the reader, it is still assumed that the functions 
are measurable. 

The coverage of a W-set, which may itself be a chance quantity, is defined by 
Coverage of S = Pr {W e S\. 

When S is a chance quantity, its coverage is also a chance quantity. The 
barycentric simplex (of dimension n) is the set of points in n + 1-dimensional 
Euclidean space (& , , • • • , t n +i) with h + U+ * • • + ? n +i = 1 and 0 < U < 1. 

The name comes from the representation of the point (ti , ? n -ti) as the 

center of gravity (in mechanical terms) or mean (in statistical terms) of the dis¬ 
tribution where a fraction U is concentrated at the ith vertex. (In order, the 
vertices are (1, 0, 0, • * • , 0), (0, 1, 0, • • • , 0), etc.) The uniform distribution 
on this simplex has an (n-dimensional) density 

nUfadfa •••*!, (0 < h , fc , • • • , U , 1 - h - fc- tn < 1), 

and the cumulative 


• • • J dh dU • • • dtn 

where the integration is over the range where 0 < U < and at the same time 

& + <*+••• + ?»-i ^ 1. 


T(x i,Xi, ••• , rr„ + i) = n\ J J 


4. The blocks determined by n values of W. We deal now with a population 
of W's (a probability measure m on the space T s {u?}), a family of functions 
<Pi > ^2> • • • , <Pm of W with a joint cumulative (measurable with respect toju) 
and a set of values Wi , w 2 , • • • , w n , (w* e T ). 

(4.1) Definition The set wi , w 2 , • • • , w n and the functions <pi , , • • • , <p m 

define blocks as follows: 

(4.2) Si = (w \<pi(w) > a x } 
where a\ = max <pi(w t ) = v>i(w\(d), which defines i{ 1). 

i 

(4.3) & = {w | <pi(w) < a \, <p 2 (w ) > 02 }, 

where a% = max <pi(W{) = i( 2) ^ i(l), wfo'cA defines i( 2). 4nd in 0 en- 

<*<i) 

era?, for 1 < k < min (m, n), 

(4.4) & = {w|«*i(w) < ai, ••• , ?*_i(u>) < a*_i, *>*(w) > a*}, 



NON-PARAMETRIC ESTIMATION 


535 


where a* = max' <pk{wi) = <Pk(w i{ k)), the maximum being taken over all i except 

t(l), t*(2), • • • , %{k — 1); and i(k) being chosen distinct from all i(j), j < k. 

If m > n, then 

(4.5) Sn+l = { w I <pi(w) < fli , • • • , <Pn(w) < On}. 

If m < n, then 

(4.6) &»|«+1 = [W | <pi(w) < Oi , • • • , <p m (w) < O m }. 

The result of this definition is to use Wi, • • • , w n and <pi , • • • , <p m to define 
n + 1 blocks (one more than there are w's) in case there are enough functions, 
and, in case there are not enough functions, to define one small block, Si , for 
each function plus one large remainder S m j n +i. We notice 

(4.2) Remark. The blocks of (4.1) are well defined unless = <pi(wk) for 

some i, k. 


5. Statement of results for the statistician. The central results can be stated 
as follows: 

(5.1) Theorem A m i n +i . If Wi, W 2 , • • • , W n are a sample of n from a dis¬ 
tributioni, if <pi , & , • • • , <p m , (m < n), are m functions such that 

*>i(W)> ^(W), • • • , 


have a joint distribution which has a continuous cumulative , and if the blocks 
Si, Si, • • • , S m and >S m | n+ i are defined as in (4.1), then 

(i) the blocks are disjoint chance sets, uniquely defined with probability one, 

(ii) the distribution of the coverages 


and 


Ci = Pr{w m Si), i = 1,2, • • • , m 


Cm\n+l = Pr {w in S m \n+l) 


is the same as that of t\, U , • • • , t m and t m +1 + t m +2 + • • • + tm+i where U 
are uniformly distributed on the barycentric simplex with n + 1 vertices . 
Conditions (5.1i) and (5.1ii) are the precise definition of a partial family of 
statistically equivalent blocks of type n + 1 and an associated (m | n + 1) tolerance 
region . 

(5.2) Theorem B n+i . If Wi, W 2 , • • • , W n are a sample of n from a distribu¬ 
tion, and if <pi, & , - • • , <Pm , (m > n), are m functions stick that 

¥>i(W), • • • , <p m (W) 


have a joint distribution which has a continuous cumulative, and if the blocks 
& 1+1 are defined as in (4.1), then 
(i) the blocks are disjoint chance sets, defined with probability one . 



536 


JOHN W. TUKEY 


(ii) the distribution of the coverages 

C{ = Pr {w in &,}, i = 1, 2, • • • , n + 1 

is the same as that of t\, > • • • , f n +i, where the U are uniformly distributed 

on the barycentric simplex with n + 1 vertices. 

Conditions (5.2i) and (5.2ii) are the precise definition of a complete family of 
statistically equivalent blocks. In Paper III we shall have to widen these notions 
a little, and this form will then he qualified by the phrase “in the narrow sense”. 

6. Statement of results for the measure theorist. The construction of (4.1) 
maps the product T n X U n into E n+1 where T is the set of w’s (and hence T n is 
the set of ordered n-tuples of w’s), U is the space of all real-valued functions 
defined over T, measurable with respect to a fixed probability measure n, and 
possessing a continuous cumulative, (i.e. y({w | <p(w) = c}) = 0 for all real c), 
and hence U n is the space of ordered ?i-tuples of such functions, and E n+1 is 
Euclidean n-dimensional space. More precisely, the mapping is into the bary- 
centric simplex with n + 1 vertices, a subset of E n+l , and is well defined except 
for a set in T n of measure zero with respect to n n , the power measure of /z. In 
these terms, we may restate theorem B as follows: 

(6.1) Theorem B n+ i. Hold the n functions <p\ ,<pt, ••• ,<p n and the probability 
measure fixed, then T n is mapped into B n and the power measure /z n is carried by 
that mapping into a measure on B n . This measure is always n\ times Lebesgue 
measure. 

7. Wald’s principle. The essential principle behind Wald’s process of dis¬ 
carding observations is sufficiently fundamental to warrant a name of its own. 
It can be stated, quite generally, in the two following forms: 

(7.1) Wald’s Principle, (discrete form.) Let W be a chance quantity, 
and consider samples of n. Fix disjoint w-sets A\, A* , • • • , A m , B. Consider 
those samples of n for which exactly one value falls in each Ai and the remaining 
n-m fall in B . The distribution of the n-m falling in B is that of a random sample 
of n-m from the distribution of W restricted to B. (i.e. hb(X) — [n(B)]~ 1 fx(BX).) 

(7.2) Wald’s Principle, (conditional form.) Let W be a chance quantity, 
and <p a function such that each value of <p(W) has probability zero. Consider 
samples of n. Then the conditional distribution of the w % , given that 

max <p(wi) = a, 

i 

is that of one w i0 with <p(w t0 ) = a and a sample of n —1 other W{ from the distribu¬ 
tion of W restricted to B — {w \ (p(w) < a). 

(7.3) Central Lemma. Let W be a chance quantity and let <pi <p n be 
functions with a joint cumulative such that <p t (w) = a has probability zero for each 
i and a (i.e. the joint cumulative is continuous). Then the conditional distribu- 



NON-PARAMETRIC ESTIMATION 


537 


tion of the remaining n—k w’s, after k blocks have been chosen according to (4.1) 
is that of a sample from the distribution of W restricted to 

B = {w | <pi(w) < ai , • • • , <p k (w) < a k ) y 

where k = 1, 2, • • • , n. 

The proofs of these statements are elementary and direct. To establish (7.1) 
we have only to show that given two sets in B n ~ k , their probabilities on the 
assumption that one Wi is in each A < are in the ratio of their probabilities for an 
unrestricted sample of n—k. But the probability of finding the n-i in a 
set R y contained in B n ~' k y and one Wi in each Ai , is exactly 

l n~ jc ) ! mWi)a«(^2 ) • • • n(A k ) 

times the probability that n — k w y y known to be in B n ~*, will fall in R. This 
establishes (7.1). 

In order to prove (7.2) we must show that the probability of a set R of n- 
tuples Wi , t 02 , • * * , w n is the same whether calculated directly or calculated by 
the proposed conditional distribution. To this end, it is natural to decompose 
R as follows: 

R = R( 1) + R(2) + •.. + R(n) + Z y 

where R(i) contains those (wi , • • • , w n ) in R for which <p(w{) > (p(wf) for all 
j 9 ^ i, and Z contains the remaining (wi , • • • , w n ); which must involve at least 
one tie <p(wj) — <p(w k ) y j t* k. Since Z has probability zero, it will suffice to 
establish the equality of the two calculations for sets of the form R(i) y and be¬ 
cause of symmetry we may restrict ourselves to sets of the form R(l). 

Given an integer N, we decompose the range of <p(w) into Nn segments of equal 
probability, which we may do because the cumulative of <p is continuous. There 
are then Nn values b k , (b 0 = — <*>, by n = + °o) such that 

Pr < <p{w) < b k } = 1 /Nn. 

We now decompose our set R (which is of the form #(1) as follows: 

R — R2 + ■ • * + Rn n + Y y 

where R k contains those n-tuples 

(wi , • • • , Wn) for which < <p(wi) < b k 

and v(wi) < for all i > 1. The remaining set Y contains n-tuples where 
the two largest <p(wi) y (i = 1 and i = i a ) y belong to the same interval. The 
probability of this is less than 

n(n - 1) f 1 V ^ 1 

K nN/ *“ 2N> 


2 



538 


JOHN W. TUKEY 


as calculated from the known distribution. Calculating from the conditional 
distribution, we find immediately a bound of 




n — 1 _ n — 1 y' k n — (k — 1)* 

~Jc (FnF ^ k 





An 
N ’ 


where A n is a constant depending only on n. Thus, as N increases, the prob¬ 
ability of the successive sets Y tend to zero—calculated either way. To show 
the equivalence of the two calculations it is now sufficient to show that they 
agree for the sets R k . But this is a case of (7.1) and the lemma is proved. 

Now (7.3) follows by induction, applying (7.2) at each step. 


8. Proof of theorems. We notice that Theorem B n is equivalent to Theorem 
A m \ n +i , since, according to (4.1) 1 = S n+1 . 

We have only to prove theorem ^l m |n+i, which we do by induction on m. 
For m = 1, it is exactly Wilks' [3, 1941] original one-dimensional theorem, and 
is known. Let us assume it for m = k and demonstrate it for m = k + 1, for 
by induction this will complete the proof. 

We must deal with the blocks &,&,••*,&, S k +1 and &* + i (n+1 , (notation 
as in (4.1) and (5.1)). We need the obvious 

(8.1) Lemma. Since the cumulative of <^*+1 is continuous , the union of Sk+i 
and Sk 4 i|m+i differs from S k \ m ^i by a set of zero probability . 

Hence _ . 

C*|rHl = C*+i -T Cfc+iln+1 • 

Since we know from the induction hypothesis that c \, Ot , • • • , c k and c k \n+\ 
have the correct joint distribution, we have only to show that c*-n and Ci , 
C 2 , • • • , Ck have the correct joint distribution. Fix ci , c 2 , • • • , c* . Then 
ai , « 2 , • • • , a* must be fixed, and so (7.3) applies to the n—k Wi’a not dis¬ 
carded after a \, ai , • • • , a* have been fixed. The conditional distribution of 
ca + i must be that of a fixed number (1 — ci — (* — ••• — c*), which is the 
probability attached to &| n +i, times the coverage of one block based on a sample 
of n — k } since the remaining n —k w *s behave like a sample. 

Consider the very particular case where w is uniformly distributed between 
zero and one and <pi(w) ss w, all that we have said in the last paragraph applies 
—the conditional distribution of c k +1 given c x ,c % , • • • , c k is the same in the two 
cases—hence the joint distribution of c x , C 2 , • • • , c k , c*+i is the same in both 
cases—but in this very particular case the joint distribution is known to be 
that required by theorem .4* + i| n +i. 



NON-PARAMETRIC ESTIMATION 


539 


REFERENCES 

[1] H. ScheffA and J. W. Tukey, “Non-parametric estimation. I. Validation of order 

statistics”, Annals of Math. Stat., Vol. 16 (1946), pp. 187-192. (Also cited as 
Paper I.) 

[2] A. Wald, “An extension of Wilks’ method for setting tolerance limits”, Annals of Math . 

Stat., Vol. 14 (1943), pp. 46-65. 

[3] S. S. Wilks, “Determination of sample sizes for setting tolerance limits”, Annals of 

Math. Stat., Vol. 12 (1941), pp. 91-96. 



SOME BASIC THEOREMS FOR DEVELOPING TESTS OF FIT FOR 
THE CASE OF THE NON-PARAMETRIC PROBABILITY 
DISTRIBUTION FUNCTION, I 

By Bradford F. Kimball 
State Department of Public Service , New York , N. Y. 

1. Summary. In developing tests of fit based upon a sample O n (x{) in the 
case that the cumulative distribution function F(X) of the universe of X*a is 
not necessarily a function of a finite number of specific parameters—sometimes 
known as the non-parametric case—it has been pointed out by several writers 
that the “probability integral transformation” is a useful device (cf. [1]—[4]). 

The author finds that a modification of this approach is more effective. This 
modification is to use a transformation of ordered sample values from a random 
sample 0»(x t ) based on successive differences of the cdf values F(xi). 

A theorem is proved giving a simple formula for the expected values of the 
products of powers of these differences, where all differences from 1 to n 4- 1 are 
involved in a symmetrical manner. 

The moment generating function of the test function defined as the sum of m 
squares of these successive differences is developed and the application of such 
a test function is briefly discussed. 

2. Introduction. Let the sample values be ordered so that 

(2.1) Xi ^ x ih i , (i = 1, 2, • • • , n — 1). 

Let F r denote the value of the cdf F(X) associated with the rth ordered sample 
value x r • Thus 

(2.2) Fr = F(x r ). 

Consider the following transformation of the ordered sample values Xi based 
upon the (hypothetically) known cumulative distribution function F(X) which 
will be taken as a continuous function of X over its admissible range: 

ui = Fi f 

(2.3) Ur = Fr - F r -1 , (f = 2, 3, • • • , Tl) 

U n +1 = 1 - F n . 

The restrictions on F, are that 

(2.4) Ft ^ F i+ i , and 0 ^ F< ^ 1. 

The above transformation (2.3) translates these conditions into the symmetrical 
conditions 

(2.5) 0 ^ Ui f and u\ + Ui + * * * + u n + u n + 1 = 1. 

A one-to-one correspondence between Ui and F< exists if one of the w< be omit¬ 
ted,—say Up . With Up omitted, the Jacobian of the transformation from F< to w f - 

540 



TESTS OF FIT FOR NON-PARAMETRIC DISTRIBUTION FUNCTION 


541 


has value unity. The probability density of the sample O n (xi), with x 4 ordered, 
is given by 

(2.6) P[O n (xi)] dO n = n! dF x dF 2 • • • dF n . 

Hence with Up omitted, 

(2.7) P[O n (x<)] dOn — nlduidiii • • • dup-i dup + 1 • • • du n + 1 . 

The sample space of the Ui with up omitted, is that portion of the n + 1 
Euclidean space of all the variables, bounded by the coordinate hyperplanes, 
which is on the projection of the hyperplane (2.5) upon the hyperplane Up = 0. 
This is a region in the n-space of the Ui with Up omitted, bounded by the coor¬ 
dinate hyperplanes and the hyperplanc 

(2.8) U\ + th + • • • + Up -1 + Up+i + * • • + U n + Un+l = 1. 

Thus the formal integral of the pdf of the Ui over sample space is 

(2.9) w! / n / dui • • • dp,p-i dup+i • • • du n +i = 1 

with 0 ^ Ui, and Ui bounded above by the hyperplane (2.8). 

It is now clear that both the pdf and the sample space of the (with up 
omitted) are symmetrical in the Ui . This fact leads to complete symmetry of 
the joint distribution function of any set of Ui , over i = 1 to n + 1 including Up , 
relative to the w,- selected. Other interesting results are forthcoming. 


3. Basic mathematical theorem. Using the techniques associated with the 
Beta function, the expectation of the products of powers Ui is found to be 

E[u?-ul •<•••] 

(3.1) 

= r(w+ i)r(p+ i)r(g+ i)r(™ + 1) • • • /r (n + p + q + w + — hi) 

where r, s, t , etc., are any set of different indices (for the present other than 0) 
from the integers 1 to n + 1 , and p, q , w, etc., are any real numbers greater than 
minus one. The relation (3.1) can further be generalized to the case where i tp 
may be included. This* will be proved for the case n = 2, with p, q and w 
taken as integers. The generalization can be concluded from inspection. Thus 
with 


Eluf-ulu?] - 2! 


2! 


Us = 1 — U\ — Ui , 

pi pl—U I 

/ u% du 2 / U\ (1 — u\ — u 2 ) w du\ 

Jo Jo 

f M?(l - W*)’*" 4 ' 1 [ 1 s’(l - s)“ ds 


2\p\w\ 


(p + w + 1)1 


f ul( 1 — U 2 ) p+W+1 du 2 » 7 ~ 

Jo (p 


2\p\q\w\ 


+ q + w + 2)1* 


Hence the theorem: 



542 


BRADFORD F. KIMBALL 


Theorem. Given a random sample of n values of X from a universe with cdf 
F(X) which is continuous over the range of X. With the sample values Xi ordered 
so that Xi ^ Xi+i define a set of n + 1 variables Ui as the successive differences of 
F(X{) by the relations (2.3). The expected value of the product of real powers greater 
than minus one of any or all of the Ui , (i = 1 , 2 , • • • , n + 1 ), is given by the rela¬ 
tion ( 3 . 1 ) above (not subject to the omission of uf). 

There are many.interesting consequences of this theorem. Perhaps the most 
striking is the following: 

Corollary 1 . Let a range a(m, k) for positive integer m be defined by 

(3.2) a(m, k ) = F(x k+m ) - F(x k ) 

with k = 0, 1, 2, • • • , n, and m g n + 1 — A; 

under the convention 

F(x 0 ) = 0 , F(x n + 1 ) = 1. 

The probability distribution of a(m, k) is independent of k and hence is the same as 
that of F(x m ). 

Another interesting consequence (not new) is the following: 

Corollary 2. The correlation of m and u k , i 5 * k , is the same for all pairs 
(i t k) over the range of indices from 1 to n + 1 , <*nd has the value — 1 /n. 
Introducing the notation 

(3.3) [n + r] r * (n + r)(n + r - 1 ) • ■ • (n + 1 ), 
the corollary follows from the relationships 

E(ui) = l/(n + 1 ), E(u]) — 2/[n + 2 ]^, E(u t n k ) — 1 /[n + 2]i . 

The fact that the correlation between any two frequency differences Ui and u k 
is negative leads to the following more general relationship: 

Corollary 3. For any set of different indices t, j, k, etc., and for any positive 
numbers p, q , r, etc., the expectation of the product of the powers p, q, r, • • • of 
Ui , Uj, u k * -' is less than the product of the expectations of the powers taken 
separately: 

(3.4) E[uf-u q r ul • • •] < E(u?) • E(uj) • E(u r k ]) 

This follows from generalization of the relation 

r(n + i)r(p + i)r(g + i)r(r + l) 
r(n + p + q + r + 1) 

[r(n + i)]*r(p + i)r(g + i)r(r +1) 
r(» + v + i)r(» + 3 +. i)r(n + r + i) * 

The above theorem suggests the possibility of test functions for fitted distribu¬ 
tions, relative to a universe with a cdf which, since it is merely conditioned by a 
sufficient hypothesis for the theorem, may be of the non-parametric type. 



TESTS OF FIT FOR NON-PARAMETRIC DISTRIBUTION FUNCTION 


543 


A test function of the form 

(3.5) Y = 2 W» V real and positive 

m 

might first come to mind. If p = 1, compensatory effects of deviations reduce 
the efficiency of the test function. One is thus led first to consider the test 
function (3.5) for the case p = 2. 

4. The moments of the probability distribution of y m = 2 u* . We are 
first concerned with the problem of the determination of the moments of the 
function 

(4.1) y m = X) «< 

m 

where i ranges over any particular fixed set of m integers which for simplicity 
is usually taken as the first m. 

One first recalls the fact that the result is independent of which m indices have 
been selected; and that the expected value of any combination of powers is 
independent of which specific subscripts of u x are involved. 

Since the u x are correlated, principles of combinatory analysis are involved in 
determining the moments of y m . One possible way of obtaining the moments 
is as follows: 

Let v r denote the rth moment of y m about y m = 0. Thus 

(4.2) E[(y m ) r ] = v r - *[(£ «4) r ). 

m 

Now in the expansion of ifiY > the sum of the power indices of each term 

m 

is 2 r. Thus referring back to (3.1) and (3.3) it will be noted that the expected 
value of each such term will have the common factor 

1 /[n + 2r\ir . 

Consider a general term of the expansion of (Jj u<) r 

m 

C riri ...r** uY'u't • • • u 2 il k , with n + r 2 + • • • + r k == r. 

Clearly 

EWM? * * * u % h ) = 2r i ! 2r 2 1 • • • 2 r k \/[n + 2r] 2r . 
and the coefficient C ri r,. is the multinomial coefficient 

r! 

C ' r,r *" r * “ n! r 2 ! • • • r* I ‘ 

Now in the expansion of (22^<) r group the terms which have the same set of 

m 

k values of r», irrespective of which indices of u x are involved. The number of 
such terms (since each involves k different indices) is (?). If n , r 2 , • • • , r k , 



644 


BRADFORD F. KIMBALL 


are all different each combination could be taken in k ! different ways. Thus with 
r’s all different and fixed, the sum of all coefficients of terms with same combina¬ 
tion of 2r,- powers (irrespective of variation of indices of the Ui) is 



This would then constitute the total multiplier for 

2n ! 2r 2 ! - • - 2r* \/[n + 2r] 2r 

for a given set of k r’s which are all different. 

If some of r’s are repeated, let fa , fa , • • • , k, denote the number of repetitions 
of each different r t * (ki ^ 1, and fa + fa + • • • + fa = k). Then each com¬ 
bination of the k r’s corresponding to a set of k products could be taken in 

k\/(fa\ fal ••• &,!) 

different ways. Hence the lemma: 

Lemma 1. Consider all admissible sets of k different subscripts of u> and a fixed 
set of values of r = r t , r 2 , • • • , r* where 

n + r 2 + • • • + r* = r 

such that 8 of these r’s are different , and the number of repetitions in the set of r’s is 
given by fa fa ••• fa (fa ^ 1, and fa + fa + • • • + fa — k). The composite 
coefficient of the terms in v r involving the factor 


is given by 
(4.3) 


2ri! 2r 2 ! ••• 2 r k \/[n + 2r] 2r 

(m\ k ! r\ 

\k) fa\fa\ •••fa\ r x !r 2 ! ••• r*!' 


Examples of computation of v r by means of the above lemma . 
moment is given by 

(4.4) Vi =?= E(£uty = m 2 \/[n + 2] 2 , 

m 


The first order 


The second order moment is given by 

v, = = CM) + cMu'i), 

m 


and determining the values of C\ from Lemma 1, 

* - [m41 + (j) (||) ^ 2!2l]/[n + 4]. 


(4.5) .* = [m4l + 8 (g)]/+ 4 1« “ [ w + ( 2 )§]/(* 4 0* 



TESTS OP PIT FOR NON-PARAMETRIC DISTRIBUTION FUNCTION 


545 


Again for the third order moment, 

Vt = £[(!><) *] = CJS(«}) + C 2 E(u\u)) + CtEiu^ul), 

m 


and using Lemma 1, 

- [ m61 + (2) iTTi IT 5 i 2141 + (3)I! 1W1 2I212 ']/>” + «• 

= [m6! + ( 2 ) 2!3!4! +(^2!2!2!3l]/ [n + 6], 


or 


< 4 . 6 , 

Similarly writing the fourth moment in the form 

t>4 = CM) + C 2 E{uWj) + C t E(u*u)) + CMu’ui) + CMu’uW.) 
and using Lemma 1 it reduces to 

<4.7, , - [-+(;)$++(;)i+(;)£]/(• i 8 )- 

. Higher order moments of the probability distribution function may be com¬ 
puted as desired. 

An alternate method of computing the moments of the distribution of this test 
function is the following: 

Consider a function g 0 (x) such that 

(4.8) = (2r)!, *(0) = 1. 


Thus 

(4.9) E[u' r \ = [<Tgo(0)/dx r )/[n + 2r] Jf . 


From the principles of combinatory analysis of linear operators, it follows that 1 


(4.10) 


®[(E «sn - 


/[n + 2r]fc.. 


Although this is an enlightening analytical form, actual computations seem to be 
simpler with the use of Lemma 1. 


1 One way of seeing this is to first think of the U{ as statistically independent. The 
numerators of the resulting terms would be the same as in (4.10). When the m are taken 
as dependent, by virtue of (3.1) the numerators will remain the same while all denominators 
will reduce to [n 4- 2r]t f . 



546 


BRADFORD F. KIMBALL 


Moment generating function. The moment generating function of the prob¬ 
ability distribution of y m can be written as 

(4.11) E(e ,v ) = G 0 (t, m) = 1 + E W(g 0 (x)] m /dx r | _«]/[« + 2rW'/r! 

r-1 

with 

g 0 (x) = 1 + 2 ! x + 4 ! x a /2 ! + 6 ! x*/3\ + • • • + (2r) ! x r /rl + • • - 

[n + 2r] 2r = (n + 2r)(n + 2r — 1) • • • (n + 1). 

Although g 0 (x) exists only as a formal power series, G 0 (t , m) is defined by (4.11) 
as a power series with positive coefficients, converging for all t. 


5. Some comments on test function, p = 2. At the present time the study of 
the test function for p = 2 has not gone far enough to justify publication of re¬ 
sults. One difficulty is that although its asymptotic distribution function ap¬ 
pears to be normal, the convergence towards normalcy may be extremely slow 
in some cases. 

Furthermore there are indications that the case m = n + 1 will give the most 
definitive results not only because the complete range of data is used, but also 
because errors of Type II would in general have a less erratic effect. 

For the case m = n + 1 the mean, variance and third and fourth reduced 
moments (i.e. moments about the mean divided by corresponding power of <r) 
are: 

Case m = n + 1. 


(5.1) 


E(y n +i ) = 2/(n + 2), a = 4 n/[(n + 2 )\n + 3 )(n + 4)], 
lOn — 4 / jfc + 3)(n + 4) 


«3 


Ma/o 


(n + 5)(» + 6) 


<*4 


I” n 8 + 101n 2 + 14n — 8 1 

r3(n + 3)(n + 4)1 

L(n + 5 ){n + 6)(» + 7 )(n + 8)J 

L » J 


_ 6(41n 4 + 241n 3 + 118n 2 - 784n - 48) 

ai n(n + 5)(n + 6 )(n + 7 )(n + 8) 


If data is not grouped the test may be applied as follows: Given a function 
Q(X) which has been fitted to the cdf F(X ). From a random sample of size n 
with Xi ordered as in (2.1) compute the successive differences of Q(xi) to obtain 
the variables . Then consider the sum of the squares 

u* = E uV. 

».+i 

If Q(X) is a true representation of F(X) the variation of U* will follow that of 
y n+ i . Thus the expected value of U* } its variance etc. will be independent of 
the fitted function Q(X), which represents certain advantages over the x test. 



TESTS OF FIT FOR NON-^ARAMETRIC DISTRIBUTION FUNCTION 


547 


The effect of Type II errors can be roughly analyzed as follows: In considering 
the effect of such errors the testing procedure must be criticized from the point 
of view that 

Q(X) * F(X). 

For m = n + 1 it still is true that 

2 u* = 1 

which tends to act as a control upon U*. For example set 

W* = U x -f Xi • 

Then from the above relation it follows that 

(5.2) 2 X . = 0. 

Write U* as 

U* - M + 2 X ? + 22 u tX i 

(5.3) 

= 2 u* + 2x< + (22xi)/(n + 1) + 22x»5(w<) 

where 8(ui) denotes the variation of the true frequency differences from their 
expected value l/(n + 1). 

The variation 8(ui) will be to a considerable degree independent of x*. Thus 
the term 2x* will in general tend to be larger than the last term on the right. 
The third term on the right will be zero by virtue of (5.2), and hence U* will tend 
to be larger than y n +i . A similar effect upon the sampling variance of U* can 
be noted. Hence an interval of rejection 

U* ^ Ay P[y n +1 ^ A] = a = confidence level, 

is pointed to. 

On the other hand if m < n + 1 the condition (5.2) no longer holds, the term 
(2 2x»)/(ft + 1) of (5.3) will not be zero and in many cases would dominate the 
other two error terms. Thus it is easily conceivable that one may have in the 
case m < n + 1 

U* <y m 

even when the discrepancies are large. Hence in the case m < n + 1 choice 
of confidence interval will require considerable care (see [1]). 

Although the distribution of y n +1 for small n is decidedly non-normal, if the 
test function is replaced by 

(5.4) r n+l = (2[w< - l/(n + l)) 1 ) 1 

it will be found that the probability density function takes on the normal charac¬ 
ter quite rapidly with increasing n. Indeed the author has found that a com¬ 
puted approximation to the probability density function of r n +i with n = 4 is 
decidedly normal in character. 



548 


BRADFORD F. KIMBALL 


REFERENCES 

[1] J. Neyman, “Smooth test for goodness of fit,” Skand. Aktuar. Tidskn. (1937) p. 149. 

[2] E. S. Pearson, “The probability integral transformation for testing goodness of fit 

and combining independent tests of significance,” Biometrika , Vol. 30 (1938), 
pp. 134-148. 

[3] E. J. G umbel, “Simple tests for given hypothesis,” Biometrika, Vol. 32 (1942), pp. 317— 

333. 

[4] H. Scheff6 and J. W. Tukey, “Non-parametric estimation, I. Validation of order 

statistics,” Annals of Math. Stat., Vol. 16 (1945), pp. 187-192. 



AN ESSENTIALLY COMPLETE CLASS OF ADMISSIBLE DECISION 

FUNCTIONS 

By Abraham Wald 
Columbia University 

Su mm a r y. With any statistical decision procedure (function) there will be 
associated a risk function r(0) where r(0) denotes the risk due to possible wrong 
decisions when 0 is the true parameter point. If an a priori probability distribu¬ 
tion of 0 is given, a decision procedure which minimizes the expected value of 
r(0) is called the Bayes solution of the problem. The main result in this note 
may be stated as follows: Consider the class C of decision procedures consisting 
of all Bayes solutions corresponding to all possible a priori distributions of 0. 
Under some weak conditions, for any decision procedure T not in C there exists 
a decision procedure T* in C such that r*(0) <: r(0) identically in 0. Here r(0) 
is the risk function associated with T, and r*(0) is the risk function associated 
with T*. Applications of this result to the problem of testing a hypothesis are 
made. 


1. Introduction. In some previous publications [1], [2] the author has 
considered the following general problem of statistical inference: Let 
X — (Xi , • • • , X n ) be a set of chance variables. Suppose that the only infor¬ 
mation we have concerning the joint distribution function F of these chance 
variables is that F is an element of a given class ft of distribution functions. 
Suppose, furthermore, that a class D of possible decisions d is given one of which 
is to be made on the basis of an observation x = (xi, ■ • • , x n ) on the chance 
vector X. The problem is then to construct a function d(x), called statistical 
decision function, which associates with each sample point x an element d(x) 
of D so that the decision d(x) is made when the sample point x is observed. A 
statistical decision function d(x) is defined over all possible points x of the sample 
space and for each sample point x the value of the function is an element of D. 
Each element d of D will usually be interpreted as a decision to accept the 
hypothesis that the unknown distribution F of X belongs to a certain subclass 
a) of 0. Different elements d of D correspond to different subclasses w of Q. 

The problem of testing the hypothesis H that the unknown distribution func¬ 
tion F belongs to a given subclass w of Q, is contained as a special case in the 
above general problem. The space D will then contain only two elements, 
di and d 2 , where d\ denotes the decision of accepting H and <U denotes the 
decision of rejecting H . 

As in [1] and [2], we shall assume also here that 8 is a ^-parameter family of 
distribution functions. Then each element of Q may be represented by a point 
$ — (0i, • • • , 6 k ), called parameter point, in the ^-dimensional Cartesian space. 
The class Q is then represented by a subset of the ^-dimensional Cartesian space, 

649 



550 


ABRAHAM WALD 


called parameter space. We shall, therefore, refer to Q as the parameter space 
and to its elements as parameter points. 

The merits of any particular decision function d(x) will usually depend on 
the relative importance of the various possible errors caused by not selecting 
the proper element d of D. The relative importance of such errors has been 
described in [1] and [2] by a weight function W(6, d) defined over the product of 
0 and D. For any pair (0, d) the value of W(0, d) is non-negative and expresses 
the loss caused by taking the decision d when Q is the true parameter point. 
For any given decision function d(x) the expected value of the loss is given by 

(1.1) r(0) = [ W[6 , d(x)] dF(x) 

Jm 

where M denotes the sample space and F(x) is the joint cumulative distribution of 
X = ( Xi , • • • , X n ) corresponding to the parameter point 6. 

The function r(0) is defined over the parameter space Q and is called the risk 
function. The shape of the risk function r(0) will, in general, be affected by the 
decision function d(x) used. To put this dependence in evidence, we shall use 
the symbol r[0 | d(a;)] to denote the risk function r(0) associated with the deci¬ 
sion function d(x). 

A decision function d(x) is said to be uniformly better than the decision 
function d*(x) if 

(1.2) r[d | d(z)] ^ r[d | d*(x )J 

for all 6 and if there exists at least one point 6 for which the inequality sign holds 
in (1.2). A decision function d(x) is said to be admissible if no other uniformly 
better decision function exists. 

A class C of admissible decision functions will be said to be essentially complete 
if for any decision function d{x) not in C there exists a decision function d*(x) 
in C such that 

r[0 | d*(z)] ^ r[0 | d(x)] 

for all 0. 

In section 2 we shall formulate certain assumptions which will then be used 
in section 3 to derive an essentially complete class of admissible decision func¬ 
tions. In section 4 applications are made to the problem of testing a hypothesis. 

In a recent paper Lehmann [3] obtained an essentially complete class of 
admissible tests for each hypothesis H of a certain restricted class of simple 
hypotheses. The restrictions imposed on Q in Lehmann’s paper are essentially 
those formulated by Nejanan [4], [5] to insure the existence of the type Ai 
(uniformly most powerful unbiassed) test. Our definition of an essentially com¬ 
plete class of admissible decision functions agrees with that given by Lehmann 
when the problem is to test a hypothesis and the weight function W(0, d) can 
take only the values 0 and 1. 



DECISION FUNCTIONS 


651 


2* Assumptions. Throughout this paper we shall make the following as¬ 
sumptions: 

Assumption 1: The parameter space 12 is a bounded and closed subset of a 
finite dimensional, say ^-dimensional, Cartesian space. 

We shall introduce the following convergence definition in the space D : a 
sequence {d m }, (m = 1 , 2, • • • , ad inf.), of elements of D is said to converge 
to the element d of D if 

lim W(0 9 d m ) ~ W(9 , d) 

m—oo 

uniformly in 0. 

Assumption 2: The space D is compact and, for any d, W(6, d) is a continuous 
function of 0. 

Assumption 3: For any point 0 of 12 the joint distribution function of 
X = (Xi, • • • , X n ) admits a density function p(x, 0) for all points x of the 
n-dimensional Cartesian space M (sample space). The density function p(x, 0) 
is assumed to be continuous in x and 0 jointly. 

In what follows we shall mean by a distribution function /(0) of 0 a cumula¬ 
tive distribution function for which / df{6) = 1 and for which / TF(0, d)df (0) 

J Q Jq 

is not zero identically in d. 

Assumption 4' For any point x of M, except perhaps for a set of measure 
zero, and for any cumulative distribution function /(0) there exists one and 
only one element of' D for which the expression 

(2.1) [ W(0, d)p(x > 0) df(6) 

Jq 

takes its minimum value with re s pect to d. 

Assumptions 1 and 3 in this paper are exactly the same as Assumptions 1 and 3 
in [2]. The formulation of Assumptions 2 and 4 is somewhat different from 
that given in [2]. This is mainly due to the fact that in [2] the space 2) has the 
same elements as 12, while here this is not necessarily so. It can be verified 
without difficulty that this slight modification of the assumptions does not 
affect in any way the validity of the results obtained in [2]. Thus, we shall be 
able to make use of any theorems proved in [2] for the purposes of the present 
paper. 

3. Derivation of an essentially complete class of admissible decision func¬ 
tions. For any distribution function /(0) defined over 12 and for any sample 
point x let d(x y f ) denote the element of D for which the expression (2.1) takes 
its minimum value. It follows easily from the definition of r(0) and d(x y f) that 

[ rlO\d(x,f)]dfW £ [ r{Q\d*(x))df(6) 

J Q JQ 


(3.1) 



652 


ABRAHAM WALD 


for any decision function d*(x). If we interpret /(0) as an a priori probability 
distribution of 0, inequality (3.1) says that the expected value of r(0) takes its 
minimum value for the decision function d(x, /). We shall refer to d(x } f) as 
the Bayes* solution of the problem corresponding to the a priori probability 
distribution /(0). 

We shall now prove the following theorem. 

Theorem 3.1. The class C of all Bayes* solutions d{x , /) corresponding to all 
possible a priori distributions f(B) is an essentially complete class of admissible 
decision functions . 

Proof. First we show that for any distribution /(0) the decision function 
d(x , /) is admissible. Let d(x) be a decision function such that 

r[6 | d(s)] ^ r[6 \ d(x, /)] 

for all 0. Then 

(3.2) f r[0 | d{x)) d}{6) g f r[0 | d(x, /)] df(0). 

J o Jq 

From the definition of dix, f ) it follows that the equality sign must hold in 

(3.2) , i.e., 

(3.3) j g r[6 | d(x)] d}{6) = JT r[6 | d(x, /)] df(6)■ 

From the second half of Theorem 4.2 in [2] it then follows that 

r[0 | d{x)] = r[d | d(x , /)] 

for all 0. Hence d(x, f) is an admissible decision function. 

We shall now show that the class C of decision functions d(x y f) corresponding 
to all possible a priori distributions /(0) is essentially complete. Let do(x) be 
any decision function not in the class C. The essential completeness of the 
class C is proved if we can show that there exists a distribution /(0) such that 

(3.4) r[0 | d(x, /)] g r[6 \ d 0 (x)] 

for all 0. 

To prove (3.4) we shall consider the weight function 

(3.5) W*(0, d) = W(0, d) - r[d | d 0 Or)] + Max r[6 | d 0 (a:)] 

The maximum of r[0 | d 0 (z)] exists, since according to Theorem 4.1 in [2] r[0 | d 0 (x)] 
is a continuous function of 0. Clearly, Assumptions 1-4 remain valid if we 
replace W(0, d) by W*(6, d). Let r*[0 | d(x)] denote the risk function associated 
with the decision function d(x) if the weight function is given by W*(6, d). 
According to Theorem 5.2 in [2] there exists a decision function d*(x) such that 

(3.6) Max r*[0 | d*(x)] g Max r*[0 | d(x)] 



DECISION FUNCTIONS 


553 


for any decision function d{x). Since 

Max r*[8 | d„(x)] = Max r[8 | d 0 (x)) 

• $ 

it follows from (3.6) that 

(3.7) Max r*[6 \ d*(x)} g Max r[8 | do(x)]. 

0 $ 

Inequalities (3.5) and (3.7) imply 

(3.8) r[0 | d*(x)] ^ r[6 | d*(x)\ 
for all 0. 

For any distribution /(0) we shall denote by d*(x, f ) the Bayes solution of 
the problem corresponding to the a priori distribution /(0) when the weight 
function is given by W*(0, d). Since W*(0, d) — W(9 , d) depends only on 6 
but not on d, one can easily verify that d*(x f f) — d(x , /). It follows from 
Theorems 4.4 and 5.1 in [2] that there exists a distribution /(0), the so-called 
least favorable distribution, such that (3.6) remains valid if we replace d*(x) 
by d*(x, /). Thus we can put 

(3.9) d*(x) - d*(x,f) = d(x,f). 

Hence, from (3.8) we obtain 

r[0 | d(x, /)] ^ r[0\d 0 (x)] 
for all 0. This completes the proof of Theorem 3.1. 

4. Applications to the problem of testing a hypothesis. In this section we 
shall apply the results of the preceding section to the problem of testing the 
hypothesis H that the true parameter point is included in a given subset« of 12. 
We shall assume that « is an open subset of 12. The space D consists now only 
of two elements, di and d %, where d x denotes the decision of accepting H and d 2 
denotes the decision of rejecting H. 

We shall assume that the W(0, di) is equal to zero for points 6 in the interior 
or on the boundary of oj, and positive elsewhere. Similarly, W(0, d 2 ) will be 
assumed to be positive for points 0 inside co and zero outside For any a priori 
distribution /(0) the Bayes solution is given by the following test: We reject 
the hypothesis II if (and only if) 1 

(4.1) f W(e, di)p(x, 0) df(9) > [ W(e, <h)p(x, Q) d/(0). 

Thus, the class C of regions (4.1), corresponding to all possible distributions 
/(0), is an essentially complete class of admissible critical regions. 

For any critical region R we shall denote the probability that the sample x 


1 Whether the equality sign is included or not in (4.1) is of no consequence, since by 
Assumption 4 the measure of the set of points x for which the equality holds in (4.1) is zero. 



554 


ABRAHAM WALD 


will fall in R when 6 is true by P(6 | R). It follows from Lemma 4.4 in [2] and 
Assumption 3 that P(0 | R) is a continuous function of 6 for any region R. 
Since W(6 , d\ ) is positive in the interior of ft — w, and W(B, (fe) is positive in w, 
the class C of regions defined in (4.1) will have the following properties: 

(a) For any region R outside the class C there exists a region R* in C such that 

P(6 | R*) £ P(d | JK) in co 

and 

P{Q | R*) > P(0 | R) in ft - «. 

(b) If R and R* are members of C such that 

P(6 | R *) £ P(6 | R) in « 

and 

P(d | R*) > P(d | fi) in 0 - 

then 

P{6 | R*) - P(6 | R) for all 6 . 

For any distribution g(6) consider the critical region consisting of all sample 
points x satisfying 

(4.2) [ p(x, 6 ) dg(d) > [ p(x, e) dg(e). 

Let C* be the class of regions (4.2) corresponding to all possible distributions g(d). 
One can easily verify that any region in C is also a member of C*. Thus, the 
following theorem holds: 

Theorem 4.1 Suppose that Assumptions 1 and 3 are fulfilled and w is an open 
subset of ft. Suppose , furthermore , that for any distribution g(6) the set of sample 
paints x satisfying the equation 

f p(x, 0) dg(6) = f p(x, 6) dg{B) 

has the measure zero. Then , for any region R outside (he class C* there will be a 
region R* in C* such that 

P(6 | R*) < P{B | R) in a> 

and 

P(0.| R*) ^ P(6 I R) in ft - CO. 

Addition at proof reading: After this paper was sent to the printer, the author 
obtained a generalization of Theorem 3.1 to sequential decision functions, as well as 
some other results. They will appear in a forthcoming issue of Econometrica. 



DECISION FUNCTIONS 


555 


REFERENCES 

[1] A. Wald, “Contributions to the theory of statistical estimation and testing hypotheses”, 

Annals of Math. Stat ., Vol. 10 (1939). 

[2] A. Wald, “Statistical decision functions which minimize the maximum risk”, Annals 

of Math., V ol.46 (1945). 

[3] E. L. Lehmann, “On families of admissible tests”, Annals of Math. Stat., Vol. 18 (1947), 

pp. 97-104. 

[4] J. Neyman and E. S. Pearson, ‘‘Contributions to the theory of testing statistical hypo¬ 

theses, Part II”, Statistical Research Memoirs, Vol. II, London, 1938. 

[5] J. Neyman and E. S. Pearson, “Contributions to the theory of testing statistical hypo¬ 

theses, Part I”, Statistical Research Memoirs , Vol. I, London, 1936. 



DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 

By Paul G. Hoel 
University of California at Los Angeles 

1. Summary. Given a set of k random samples, x \, x *, • • • , x k , from a 
binomial distribution with parameters p and n, it is shown that the familiar 
binomial index of dispersion 


k 



yields an approximate best critical region independent of p for testing the 
hypothesis n = no against the alternative hypothesis n > no, provided x and 
no — x are not small. Because of the nature of the test, its optimum properties 
also apply to testing whether the data came from a binomial population with 
n = no or from a Poisson population. 

2. Introduction. A problem of considerable interest in certain fields is that 
of deciding whether a set of observations should be treated as having come from 
either a binomial population or from a Poisson population. Although there was 
much discussion a few years ago concerning the best method for making such a 
decision [1], [2], [3], no solution of the problem was presented. In this paper a 
test that possesses certain optimum properties is derived for discriminating 
between two binomial populations. This test, however, is also capable of solving 
the problem of how to discriminate between a binomial and a Poisson population. 
The methods that are employed in the derivation of this test are similar to those 
of an earlier paper [4] in which the problem of discriminating between two Poisson 
populations was studied. 


3. Similar regions. Let n denote the number of trials and p the probability 
of success in a single trial for a binomial distribution. Let X\ , a*, • • • , Xk repre¬ 
sent the observed frequencies in k random samples from this binomial population. 
Now consider the two alternative hypotheses 


and 


#o: n = no, p = po 


Hi: n - nx > n«, p « px. 


The purpose of this paper is to construct a test for discriminating between the two 
values of n regardless of the values of p; however it is convenient to begin with 
these more restrictive hypotheses 


556 



DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 


557 


For the purpose of finding a critical region for testing H 0 against Hi , the Xi 
will be treated as the coordinates of a point in k dimensions. The probability of 
obtaining the particular point x \, • • • , x k when H 0 is true will be denoted by 
Po [£»]• Since the probability of obtaining x successes in n trials is given by 


it follows that 


n\ 


s! (n — x)\ 


V 3 


( 1 ) 


Po[x t \ = 


(no!)* s ( n o—*<) 

*- Po 1 qo l 

11^! (no - Xi)\ 

i 


In searching for a critical region that will be independent of p 0 , it is illuminat¬ 
ing to study the methods that were designed by Neyman and Pearson [5] for 
continuous distributions. These methods suggest that one should look for criti- 

jt 

cal regions on the surfaces 53 x i = constant. For this reason, instead of 

i 

using (1) for constructing critical regions, it is desirable to study the conditional 

k 

probability distribution of the points lying in the plane 53 x * — N, where N is a 

i 

positive integer not exceeding kn Q . The conditional probability of obtaining 
the point X\ , • • • , x k , when the point is restricted to lie in the plane 53 Xi = N, 

* i 

will be denoted by P 0 [x,-1 AT]. Its value may be obtained by dividing the proba¬ 
bility ( 1 ) by the probability that the point will lie in the plane 53 x t = N. If 

i 

this latter probability is denoted by Po[iV], then 


( 2 ) 


Po[xi\N] 


Pofe] 


PolNV 

Since the sum of k independent variables each possessing the same binomial dis¬ 
tribution has a binomial distribution with n replaced by kn , it follows that N 
possesses a binomial distribution and that 


(Amo)! 


(3) Po[m ~ nW^n)\ p '° 

If (1) and (3) are substituted in (2), it will reduce to 
Po[xi\N] 


N *» 0 -AT 


fa!)*#! (Amp - N)\ 


(4) 


(Amo)! II £*I fa - Xi)\ 


This conditional probability distribution in the plane 53 — N is independent 

i 

of po and therefore may serve as the basis for constructing a critical region that 



558 


PAUL G. HOEL 


is independent of p 0 for testing H 0 against Hi . It will therefore be possible to 
test the less restrictive hypothesis 

Hi : n = no 


against 


Hi : n = n\ > Tio . 


4. Best critical region. Although a best critical region does not exist for 
testing Hq against H[ , it is helpful to proceed as though one did. 

k 

If a critical region of size a could be selected in each plane 53 Xi = N , 

i 

(AT = 0, 1, ■ • • , fcn 0 ), then the totality of such critical regions would constitute 
a critical region of size a that is independent of p 0 and which therefore could be 
used to test Hi against H[ . For, if P 0 [X e C.R.] denotes the probability that 
the sample point, which will be denoted by X, will lie in the critical region, it 
follows that 

fc»o 

Poix ( C.R.] = £ Po[N]PolX « C.R. I N] 

JV-0 


(5) 


= £ Poim<* 


tf-0 


This last equality follows from the fact that the sample point must lie in one of 

*■ 

the planes x,- = N, (N = 0,1, • • • , kn 0 ). 

i 

Furthermore, this would be the only critical region of size a independent of 
po , because if a critical region of size a N , (N = 0,1, • • • , fcn 0 ), were selected in 

k 

the plane 53 = AT (AT = 0,1, * • • , Ati 0 ), it would be necessary that 

i 

2 — a, 

N- 0 


independent of the value of po . From (3) this is equivalent to requiring that 


( 6 ) 


( kno) I 

j£o N\ (kno - N)\ 


po( 1 — po)*" 0 N oln 


a, 


independent of the value of po . Since the left side of (6) is a polynomial in p 0 , 
its constant term must equal a and all other coefficients must vanish. It will be 
observed that no terms of the sum in (6) that arise from N > r will contribute to 
the coefficient of pi ; consequently this coefficient will not contain the unknowns 
«r+i , • * • , a*n 0 . These considerations show that the a N must satisfy equa¬ 
tions of the form 



DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 


55& 


OL — CoqQIo 

0 — Cio ao + Cii ai 


0 = C*„ 0 oao + C*„ 0 ]Ofl + •• * + Ckn 0 Jtn 0 aicno • 

It will also be observed that Crr — ( kn o) !/r !(fcn 0 — r)!; consequently the triangular 
matrix of the coefficients in these kn 0 + 1 non-homogeneous equations is non¬ 
singular. The equations therefore possess a unique solution, namely the known 
solution of a N = a. 

The preceding discussion shows that it is necessary to find critical regions of 

k 

size a in each plane ^ = N, (N = 0,1, • • • , kn Q ) , if a critical region indepen- 

i 

dent of p 0 is desired. If each such planar critical region were a best critical 
region for that plane, then the totality of such regions would constitute a 
best critical region independent of p 0 for testing H' 0 against H[ . 

It follows from the theory of best critical regions [5] that if a best critical region 

k 

in the plane 23 = N did exist, it would be determined by the inequality 


(7) 


Polxi\N] ^ v 
PilXi\N) ^ 


where P\ corresponds to P 0 when Hi is true and where K is a constant w hose value 
is chosen to make the critical region one of size a. Now from (4), 

P 0 [xi\N] _ (no \) k (hno — N)\(kni)\U(ni — x,-)! 

* J Pilxi | N\ (m !) A (A*ni - N) \(kn o)! n(n 0 - Xi) ! * 

In order to study the possibility of a best critical region, it is therefore neces¬ 
sary to study the possibility of (8) satisfying inequality (7). 


5. Approximate best critical region. Unfortunately, because the variables Xi 
are discrete, it is not possible to find critical regions of exactly size a for arbitrary 
a as required in (5). Consequently it is necessary to introduce continuous ap¬ 
proximating functions for discrete probability functions or to resort to other 
devices if critical regions of the type discussed in the preceding section are to 
be obtained. 

For the purpose of introducing such approximations, (8) will be written in 
the following form: 

Pofa 1 N] = (fcwo - AQI /iy n °-' v ^ (few, - N)\ /lV"*-" 

Pj[x, | N) fl n(w» — Xi)i \kj ' n(m — x,)! \fe/ * 


( 9 ) 



560 


PAUL G. HOEL 


where c\ is independent of the variables X {. It will be observed that the ratio 
on the right is a ratio of two multinomial functions. Now the multinomial 
function 


AT! 


Xilx 2 \ • • • Xk\ 


*1 *| Xfc 

Vl Vi ■ • • P* I 


where £ x * = N, can be approximated by the multivariate normal function 


( 10 ) 


{2vN) h(k '~ l) VpiP 2 ---Pk 


The approximation is good provided the are large and the Xi remain away 
from their extreme values. If this approximation is applied to both numerator 
and denominator of (9), to this order of approximation, 


(ID 


Po[Xi | N] 

Pi[xi\N] 


Cl 


k kl2 e exp 


_ t v' / Xi ~ N/k 

. * i VVno - N/k 
Mkno - A0P _1) 



k kn e exp 

±/_x<-N/k VI 
* 4* VVnt - N/k) J 


[2 *(k ni - AOP" 1 ’ 


kni — Ar 1 i(k 15 r x ni — no 

kno - N J C GXP L 5 (m - N/k)(no - N/k) 



Since, by hypothesis, n x > n 0 and no > N/k , except for the case of no = N/k, 
which will be considered later, it follows that 


__ > o 

(ni - N/k)(no - N/k) 

k 

As a consequence, the right side of (11) will decrease in value as 23 — N/k) 2 

i 

increases in value. If (x x , • • • , Xk) is a point lying on the sphere 

(12) £ (Xi - N/k)* = R 

1 

and if the coordinates of this point satisfy inequality (7) when approximation 
(11) is used, then all points outside this sphere will also satisfy (7) to this same 
<mler of approximation. A best critical planar region of size a in this approxi¬ 
mate sense can therefore be obtained in the plane ^2 Xi = N by determining a 



DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 


561 


sphere with center at (N/k, • • • , N/k) such that when Ho is true the probability 
is cl that a point lying in the plane will lie outside this sphere. Furthermore, such 
a region will be a common best critical region for all values of n x > no because 
the preceding arguments do not require the value of rii but merely the knowledge 
that rt\ > no . 

For the purpose of determining the radius of the sphere that will yield the 
desired critical region, (4) will be expressed as follows: 


(13) 


Po[Xi 


ATI = r, — W (fcw ” ~ *> ! 
IIx.l \kj n(no - x,)! 


icj 


where C 2 is independent of the x t -. If these multinomials are replaced by their 
multivariate normal approximations as given by (10), to this approximation 
(13) will reduce to 




(14) 


= c 3 e exp 


Z to 
-h 1 


N/k) 2 


*(i -*L\ 
k V kno) J 


where c 3 is independent of the x ,-. Since ^2 x ( = N here, x* may be expressed 

1 

in terms of the remaining variables; consequently (14), except for a constant 
factor, may be treated as a normal distribution in the variables X\, • • • , Xk~i . 
If the factorials in c 3 are replaced by their Stirling approximations, it will be 
found that c 3 is the correct constant for the normal distribution. 

Since it is known [6] that — 2 times the exponent in a normal distribution func¬ 
tion possesses a chi-square distribution, it follows that to this order of ap¬ 
proximation 


(15) 


Z to - N/k) 2 
k V W 


possesses a chi-square distribution with k — 1 degrees of freedom. If xl is a 
value such that P[x* > xl] = «> then 


(16) 


Zto 

1 


N/k) 2 


*[(i -*L\ 

k V kno) 


xl 


determines a sphere such that to this order of approximation the probability is 
a that a point lying in the plane Z x< = JV will lie outside the sphere. From 



562 


PAUL G. HOEL 


the arguments following (12), it therefore follows that a common best critical 
region in this approximate sense for testing Hq against H[ will consist of that 

k 

part of each plane Y) a% = N. (N = 0. 1. • ■ ■ . kn 0 ), which lies outside the cor- 

i 

responding sphere given by (16). Since the are non-negative and do not 
exceed n 0 , the planes corresponding to iV = 0 and N — kn Q contain a single 
point; therefore it is necessary to adopt some convention that assigns 100a per¬ 
cent of the samples with A = 0 and N — kn 0 to a critical region in order to obtain 
critical regions of size a in these two cases. 

For a given set of data, the procedure to be followed then consists in calcu¬ 
lating the statistic 

k 


n (*< - *)’ 

i 



where x = *i/k, and agreeing to reject the hypothesis that n = n 0 in 

favor of the alternative hypothesis that n > ?i 0 if and only if z > x« , where 
P[x > x a ] = a for k — 1 degrees of freedom. Because of the nature of the 
approximations used in (10) and (14), this result may be expected to be accurate 
only if x and n 0 — Jc are large. 

The interesting feature of this result is that the familiar binomial index of 
dispersion, z, possesses optimum properties in this approximate sense for testing 
n — n 0 against n > n 0 . 

6. Poisson application. Since the preceding test will possess approximate 
optimum properties for n as large as desired, independent of the value of p, 
and since a Poisson distribution with parameter m can be approximated as 
closely as desired by means of a binomial distribution with np = m by allowing 
n to increase sufficiently, it follows that the test will also possess approximate 
optimum properties for deciding between a binomial distribution with n = uq 
and a Poisson distribution. 


7. Estimation of n. Although the purpose of this paper has been accomplished 
in the preceding sections, it is interesting to observe the role played by the closely 
related Poisson index of dispersion in the extimation of n. 

Approximate confidence limits for n may be obtained by means of (16). 
If xi-« is a value of x such that P[x > xi-«\ = 1 — a, then, to this same order 
of approximation, the probability is 1 — 2a that 

±(x 
2 ^ 1 
Xl-« < - y- 

x(l 




DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 


563 


If these inequalities are solved for n, the following 100(1 — 2a) percent approxi¬ 
mate confidence limits for n will be obtained: 


(17) 


__ < n < _ $Xi-« _ 

2 _ X(xi ~ x)* 2 _ 2 {Xj - x )* • 

Xa - Xl—a - 


Only the lower limit here will possess optimum properties. Now it will be ob¬ 
served that only positive values of n will be admissible if 

- if ^ t 

r S Xi—«> 


whereas only negative values will be admissible if 

X(x% x) ^ 2 
~ ^ Xa * 


The range of values will be infinite in each case if there is equality rather than 
inequality. If, however, 


2 

Xi- 


2 (xi — x) 2 


2 

Xa , 


then both positive and negative values of n over infinite ranges will be admissible. 
Since n increases as the Poisson index 2(z, — x) 2 /x increases until it becomes 
infinite and then increases from minus infinity through negative values, (17) 
may still be thought of as giving an interval (infinite) of values with a positive 
“lower” limit and a negative “upper” limit. Thus, the familiar Poisson index 
of dispersion plays an interesting role in determining whether a Poisson assump¬ 
tion is reasonable as far as admissible values of n are concerned. 

If the population is truly binomial, negative values of n must be ruled out; 
consequently a Poisson assumption becomes increasingly tenable as the Poisson 
index increases. However, experience has shown [7] that a negative binomial 
distribution is often more realistic in describing data supposedly drawn from a 
binomial "or Poisson population than is the assumed distribution; consequently 
a negative binomial should be given consideration if (17) yields only negative 
values or if it yields a negative “upper” limit that is numerically small relative 
to a positive “lower” limit. 

It is also interesting to consider the point estimation of n. Here, it is cus¬ 
tomary [7] to estimate n by means of 

kx 

k _ *{*<-*)\ 
x 

Thus, a positive, infinite, or negative estimate for n will be obtained according as 
the Poisson index is less than, equal to, or greater than k. 



564 


PAUL G. HOEL 


REFERENCES 

[1] J. Berkson, “Some difficulties of interpretation encountered in the application of the 

chi*square test,” Jour. Amer. Stat. Assoc., Vol. 33 (1938), pp. 526-636. 

[2] B. H. Camp, “Further interpretations of the chi-square test,” Jour. Amer . Stat. Assoc., 

Vol. 33 (1938), pp. 537-542. 

[3] J. Berkson, “A note on the chi-square test, the Poisson, and the binomial,” Jour. Amer . 

Stat. Assoc., Vol. 35 (1940), pp. 362-367. 

[41 P. G. Hoel, “Testing the homogeneity of Poisson frequencies,” Annals of Math. Stat., 
Vol. 16 (1945), pp. 362-368. 

[5] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statistical 

hypotheses,” Roy. Soc. Phil. Trans., Vol. 231 (1933), pp. 289-337. 

[6] S. S. Wilks, Mathematical Statistics, Princeton Univ. Press, 1943, p. 104. 

[7J “Student,” “An explanation of deviations from Poisson’s law in practice,” Biometrika, 
Vol. 12 (1919), pp. 211-215. 



BILINEAR FORMS IN NORMALLY CORRELATED VARIABLES 

By Allen T. Craig 
University of Iowa 

1. Summary. If a variable x is normally distributed with mean zero, we have 
previously given a necessary and sufficient condition (see references at end of 
this paper) for the independence of two real symmetric quadratic forms in n 
independent values of that variable. This condition is that the product of the 
matrices of the forms should vanish. In the present paper, we have proved 
that the same algebraic condition is both necessary and sufficient for the inde¬ 
pendence of two real symmetric bilinear, or a real symmetric bilinear and 
quadratic form, in normally correlated variables. 

2. Introduction. In this paper, we determine the moment generating function 
of the joint distribution of two real symmetric bilinear forms in certain normally 
correlated variables and derive a necessary and sufficient condition for the 
independence, in the probability sense, of these forms. We further investigate 
the* condition for independence, in the probability sense, of real symmetric 
bilinear and quadratic forms. 

3. The moment generating function of the distribution of real symmetric 
bilinear forms. Let the two variables x and y have a joint normal distribution 
with means zero, unit variances and correlation coefficient p. From this bi¬ 
variate distribution, repeated random samples of n pairs, say (xi, yO, (z 2 , 2 / 2 ), 
• • • , (x n , y n )j are drawn. Let C = ] | c jk 11 be a real symmetric matrix and write 
0 = 22 CjkX/yk • The moment generating function of the distribution of 0 
is then given by 

¥>(0 = #[«"] = (2jt VPV)" L„ e<# 0 dVn dXn " ‘ dyi dXi ’ 

where 

Q = i 2 (z* + y* - VpxMj) 

and 0 is defined above. If we subject the x ’3 and y’s to the same linear homo¬ 
geneous transformation with appropriately chosen orthogonal matrix L, then 
Q remains invariant and 0 becomes 2yXyx'•?/'• where the Vs are the n real roots of 
the characteristic equation of C, that is, of | C — \I | == 0. The integrations 
are then easily effected and we find that 

<p(o = {n [1 - t( P + i)\ ; ][i - Up - dm> 

= { | / - t( P + DC | • | / - t{p - l)C | } H 

= 1 1 - 2 P tC - (1 - p’Ji’C’ | 

565 



566 


ALLEN T. CRAIG 


where I is the unit matrix of order n and the vertical bars, as usual, indicate the 
determinant of the enclosed matrix. 

Next, let A = || a jk || and B = || b jk || be two real symmetric matrices each of 
order n. Write 0i = 22aj k x,y k and d 2 = 2'ZbjkX/y k where the x’s and y *s are the 
items of the sample randomly drawn from the bivariate distribution previously 
described. The moment generating function of the joint distribution of 0i and 0 2 
is then given by 

= (2*r\/r~—~p 2 )~ n /**••• ( m e‘ l, ' +t>e '- Q dy n dx n ■ • • dy x dx,, 

where 0i, 0*, and Q have the meanings previously assigned to them. If we 
pursue a line of reasoning similar to that above, we find that 

<p(ti jtz) = | / — 2 p(t\A + tzB) — (1 — p 2 )(tiA + kB ) 2 1 

4. The independence of bilinear forms. It is clear that there exist positive 
numbers, say hi and hi , such that <p(k , k) exists for 0 < h < hi and 0 < k < hi . 
It is well known that a necessary and sufficient condition for the independence 
of 0i and 0j is that <p(k , k) shall factor into the product <p(ti , 0)^>(0, k). If then, 
we assume 0i and 0 2 to be independent, we have essentially 

1 1 — 2 P {tiA + kB) - (1 - P 2 )(tiA + kB ) 2 1 

j = | / - 2 phA - (1 - p 2 )t\A 2 1 • | / - 2 pkB - (1 - p 2 )%B 2 1. 

If h denotes the smaller of hi and hi , then the factored form holds for 
0 < t \, k < h, and hence for all real values of h and k . In particular it holds 
for k = ti so that 

1 1 - 2pk{A + B) - (1 - p 2 )t\(A + B ) 2 1 

- I / - 2 pkA - (1 - p 2 )L\A* | • | Z — 2 phB - (1 - p 2 )t\B 2 1. 

Let n, r 2 , and r < n + r 2 denote the ranks of the matrices A , B, and A + B, 
Further let the real non-zero roots of the characteristic equations of these ma¬ 
trices be denoted respectively by ai, a 2 , • • • , a ri , ft, &, • • • , Pr t , and 71 ,7a, 
• • •, 7 f . Then the members of the preceding equation may be written 

II [1 — ti(p + l)y»][l — *i(p — l)y»] 

i~l 


and 

n [1 — ti(p + l)a*][l — ti(p — l)a»[ II [1 ~ ti(p + l)ft][l — fi(p — 1 )ft] 

ft-1 im.1 

respectively. It is seen that the left member is a polynomial in h of degree 2r 
and that the right member is a polynomial in ti of degree 2(ri + r 2 ). Accord- 



BILINEAR FORMS 


567 


ingly, r = n + r 2 and the roots 71 , *•* , 7i consist of the roots <* 1 , • • • , ct ri , 
ft» • • ■ » ft**. That is, if ft and ft are independent, then the rank of A + B 
is the sum of the ranks of A and B and the non-zero roots of the characteristic 
equation of A + B consist of those of the characteristic equation of A together 
with those of B. Further, if in (1) we put U = vt\ , where v is real, we have 

| / - 2 ph(A + vB) - (1 - p 2 )t\(A + vB ) 3 1 

= 1 1 - 2 fitiA - (1 - p)i\A 2 M / - 2phvB - (1 - p 2 )lVB % |. 

Denote the rank of A + vB by r' and the non-zero roots of its characteristic 
equation by 61 , • • • , 6 r * • The immediately preceding equation can then be 
written 

r / 

IT [1 — h (p + 1)5»][1 — ^i(p — l)ft] 

»-x 


= XI [1 — h (p + l)a»][l - tlip — 1 )«,•] n [1 — ti(p 4 - — hip — 1 )vfc]. 

*-1 »—1 


From this we infer that, apart from zero roots, the roots of the characteristic 
equation of A + vB are «i, • • • , a ri , vf $\, • • • , v@ ri . 

If a symmetric matrix, say M(v), has elements which are real polynomials 
in the real variable v, and if the determinant 


| M(v) — X/ | = ( 1) n [X - Pl (iO][X - Pi(v)] • • • [X - p n (t>)], 


where pi(v), p 2 W, • • • , p n (v) are likewise real polynomials in v , then there exists, 
for all real values of v, a real orthogonal matrix, say L(v), such that 


L'{v)M{v)L{v) = 


PlW 

0 •• 

0 

0 

P2M 


0 


Pn(v) 


Furthermore 1 , exists for all real values of v. Since 

dv 

| A + vB - \I | = (-l) n \ Hri ^(X - ad • • • (X - «rj)(X - t;ft) ... (X - V0 ri ), 

1 A number of years ago, in connection with another problem, the writer sought the as¬ 
sistance of Professor N. H. McCoy for a proof that L(v) is differentiable at v ■■ 0. Pro¬ 
fessor McCoy’s elegant demonstration of the existence of L{v) showed that each element 
of this orthogonal matrix is itself a real polynomial in t>, divided by the positive square 
root of another real polynomial, which polynomial is never negative and which vanishes 
for no real value of v. Thus the derivative of L(v) exists not only for v *■ 0 but for all 
real values of v. The writer thanks Professor McCoy for his kind and generous assistance. 



568 


ALLEN T. CRAIG 


then A + vB belongs to the class M(v) so we have 

c*i 0 • ■ • 0 

0 — a n • • • 0 

0 • • • • • • 0 

(2) L'(v)(A + vB)L(v) = * 

0 • • • vj3 ri • • • 0 

0 0 


In particular, 


(3) 


Ctl * • • 0 


L'(0)AL(0) - 


0 0 


If we differentiate (2) with respect to v and subsequently set v = 0, we have 



(4) AL( 0) + L'(0)BL(Q) + L'(Q)A d ^- 

av av 



BILINEAR FORMS 


569 


Since L{v) is orthogonal, then I/(v)L{v) = I. Upon differentiating both mem¬ 
bers with respect to r, and subsequent! v setting r = 0, it is seen that — L( 0) = 

civ 

— L'(0) that //(()) is a skew-symmetric matrix, say S. Further 

(5) <tL ~ = -L'{ 0) dL ^ 7/(0) = —,SZ/(0), 

at; rfv 

and, by taking conjugates, 

(0) -™ (0) = -i(0) d -~- L( 0) = 7,(0),<?. 

at; at; 

If we multiply (5) on the right by AL(0) and (0) on the left by 7/(0) A, we see 
that (4) may be written 

II0 0|| 


0 

fit 


(7) TA0)BL(ff) « 


+ 8L’(0)AL(0) - Z/(0ML(0)S. 


A* 

0 


0 


0 


Since $ is skew-symmetric and since L'(0)AL(0) is given by (3), then each 
element on the principal diagonal of SL'(Q)AL(0) and L'(0)AL(0)S is zero. 
Further, since L'(0)BL(0) is symmetric, then L'(0)BL(Q) takes the form 

0 kn • • • km 
ki« 0 


8i 


B r . 


kin 


0 




570 


ALLEN T. CRAIG 


Because the non-zero roots of the characteristic equation of U(0)BL(0) arc fa , 

• * • , fat then the sum of all two-rowed principal minors of the determinant of 
I/(0)£L(0) must equal the sum of the products of fa , • • • , fa t taken two at a 
time. That is 

E J3.fr = Eft-ft- - 2 fro-, 

<<3 «J 

so that each k i} ' , being real, is zero. Accordingly, SU(0)AL(0) — J/(Q)AL(0)S 
is a zero matrix and L'(0)BL(Q) is given by the first term in the right member 
of (7). We then have 

Tj(0)AL(0)U (0) BL(0) = L'(0)ABL(0) = 0, 

from which it follows that AB = 0. Thus, if the real symmetric bilinear forms 
0i and 0 2 are independent in the probability sense, the product of their matrices 
is zero. 

If, conversely, AB = 0, then 
*>«., 4) - 1 1- 2 p(kA + hii) - (1 - + llm I 

= I [/ - 2 phA - (1 - P*)<1A«][7 - 2 phB - (1 - \ 

= <p(h , 0)<p(0, h), 

and 0i and 0 2 are independent. This establishes the following theorem. 

Theorem I. Let x and y be normally correlated with means zero , unit variances , 
and correlation coefficient p. Let 0i and 02 be two real symmetric bilinear forms in n 
random pairs of values of x and y , say (xi , y\), ( x 2 , 2 / 2 ), • • • (x n , 2/ n ). A necessary 
and sufficient condition that 0i and 0 2 be independent in the probability sense is that 
the product of the matrices of the forms be zero . 

5. Simultaneous reduction of quadratic or bilinear forms. The argument 
of Section 4 may be used to establish in a very simple manner the following 
theorem. 

Theorem II. Let A and B be two real symmetric matrices with constant ele¬ 
ments, each matrix of order n. A necessary and sufficient condition that there exist 
a real orthogonal matrix of order n such that simultaneously each of UAL and L'BL 
is in canonical form , wherein no non-zero elements occupy corresponding positions 
on the principal diagonals , is that AB = 0. 

For if such an orthogonal matrix L exists, it is evident that UALUBL = 
UABL = 0 from which it follows that AB = 0. Conversely, if AB = 0, then v 
being a real scalar, the matrix (A — \I)(vB — \I) is equal to the matrix 
—X[(A + vB) — XI]. These matrices being equal, their determinants are 
equal so that A + vB belongs to the class M(v) of section 4. Thus L may be 
taken as L(0) and simultaneously UA L and UBL are of the form stated in the 
theorem. 

6. Independence of bilinaer and quadratic forms. Let 0 = ^Za ]k x } y k be a 
real symmetric bilinear form of rank n in the previously defined variables 



BIUNEAR FORMS 


571 


( x i ,1h\ ' • • > fan , y n ) and let q — ZXb lk x,Xk be a real symmetric quadratic form 
of rank »•» in jr t , r 2 , ■ • ■ ,ar n . As usual, denote the non-zero roots of tlic charac¬ 
teristic equations of A and Bhya,,a 2 , , a Tl and Pi, fa , • • • ,p, t respectively. 
The moment generating function of the joint distribution of 6 and q is 

Q - o wh ryr £•••£*.• 

where, as previously, 

Q = 2(1 - p») + ~ 2p x iVi)- 

We first orthogonally transform the variables so that the exponent in the inte¬ 
grand becomes, upon writing \\fjk\\ = L'BL y 

hZotixWi + hXZUx'ixi - —- 1 - , 2(x'* + y' 1 - 2px i y' i ). 

At — p £ ) 

We then integrate on t/J , 7 / 2 , • • • , y' n and obtain for the exponent in the inte¬ 
grand 

tazf,kx'j4 - Wi + phXajX- + t\XaWi • 

If we effect on the variables x [, x[ , • • • , zi the inverse of the orthogonal trans¬ 
formation initially used on the rr’s and y* s, the exponent in the integrand becomes, 
using || p,-i || = A\ 

h'22b jk x i x k - t&x) + P t{Z'2a ] kX j x k -)- - t\Z'2g jk XjX k 


or 

2pti&jk (1 %ttbj k ]XjX k , 

where 8j k equals 1 or 0 according as j does or does not equal k . Hence, 

(8) <p(h , k) = | 7 — 2 ptiA — (1 — p 2 )£iA* — 2t 2 B | 

If 6 and q are independent, we have 

(9) 17 - 2phA - (1 - p 2 )t\A 2 - 2*2# | 

- 17 - 2phA - (1 - P 2 )tfA 3 1 • | 7 - 2kB |, 

for 0 < h < hi and 0 < t 2 < h 2 . As before, the members of (9) are polynomials 
which, being equal for 0 < h , h < h, are equal for all real values of t\ and 1% . 
If we put t\ = 1 and t 2 = vt\ = v y where v is real, then (9) becomes 

| 7 - 2pA - (1 - p)A % - 2vB | = | 7 - 2pA - (1 - p J )A* | - | 7 — 2vJ5 | 

- ft [1 - (P - D«/][l - (P + D«/]fi(l ~ 2(0/). 

1 X 



572 


ALLKN T. CUAIG 


That is, 

j 2pA *4" (1 — p 2 )^ 2 + 2vB — X I | 

= (— i) n x n ““ <ri+r * ) [X — 2p«i — (1 — p*)a?] • • • [X — 2pa r , — (1 — p 2 )a 2 ri | 


•[x - Mil - - • [x - Mr,l 

so that 2 pA + (1 — p*)A 2 + 2vB is a matrix of the class M(v). Hence we write 


i 2 p«i + (1 — p 2 )ai ••• 0 

• 2 pa ri + (1 — p 2 )«r, 

2c/0i 


JJ(v)[2 P A + (1 - p)A* + 2vB]L(r) = 


2vp rt 

0 


0 


0 || 


The argument of section 4 shows that L'{0)[2pA + (1 — p a )/t 3 ]L(0)L'(0)2#L(0) 
is a zero matrix, from which it follows that 2 pAB + (1 — p 2 ) A 2 B = 0. But 
this imposes on p, n 2 conditions of the form 

%pljk + (1 — p)rnjk = 0, 0, k = 1> 2, • • • , n). 


Since these hold for every — 1 < p < 1, they hold identically. Hence each 1# 
and rtijk is zero. In particular, || l jk || = AB = 0 if 8 and q are independent. 
Conversely, if AB = 0, we see by Theorem II that (8) becomes 


<p(t\ y U) = <p{ti , 0)*?(0, t%) , 

so that 8 and q are independent. This yields Theorem ITT. 

Theorem III. Let x and y be normally correlated with means zero , unit vari¬ 
ances , and correlation coefficient p. Let 8 be a real symmetric bilinear form in the 
n random pairs of values of x and y, say (xi , yi), • • • , (x n , y n ), and let qbe a real 
symmetric quadratic form in x x , x 2 , • • • , x n (or y t , • • • , y n ). A necessary and 
sufficient condition that 8 and q be independent in the probability sense is that the 
product of the matrices of‘the forms be zero. 

For example, let 8 be n times the sample covariance and let q be n times the 
square of the mean of the x's. Then 

8 » X(Xj - £)(yj - y) 


“ Xlajkx/yi .; 




BILINEAR FORMS 


573 


where 


n — 1 .» . . 

a jk = —— if j = k, 

n 

= — i otherwise, 
n 

and 

q = ns a = XZbjjcXjZk , 6# — 1/n for;, A: = 1, 2, • • • , n. 

Since AB = 0, then 8 and q are independent, a fact otherwise known but perhaps 
not so easily established. 


REFERENCES 

[1] Allen T. Craig, “Note on the independence of certain quadratic forms’ 1 , Annals of 

Math. Stat., Vol. 14 (1943), pp. 195-197. 

[2] Harold Hotelling, “Note on a matric theorem of A. T. Craig”, Annals of Math. 

Stat., Vol. 15 (1944), pp. 427-429. 



ON THE CHARLIER TYPE B SERIES 


By S. Kullback 
George Washington University 

1. Introduction. The Type B series of Charlier has been discussed in some 
detail in the literature (See references at the end of the paper). The problem 
of the convergence of the Type B series has been considered by Pollaczek- 
Geiringer [12], [13], Szego [12] (page 110), Uspensky [16], Jacob [5], Schmidt [16] 
and Obrechkoff [11]. There is presented in the following a method of develop¬ 
ment of the Type B series which is believed to be of some interest, including a 
necessary and sufficient condition for the convergence which is basically the 
same as that of Schmidt [16]. A result of Steffensen [17] is extended and shown 
to be related to the Charlier Type B series. 


2. Statement of results. Consider the function p(r), defined for r — 0, 1, 2, 
• • , and such that 


(2.1) EpW = 1; 2 I p(r) | = 't 

r—0 r—0 

where A is some finite value. Let the n-th factorial moment be defined by 

M(0) = 1 


( 2 . 2 ) 


/*<*) 


= 2 r(r - 1 )(r - 2)- • • (r — n + l)p(r), (n = 1, 2, • • •) 


For arbitrary X let 

(2.3) 


T n(n — 1) .2 

L n = /i(») “ ^M(n-l) X + 2 J g(n—2) X 

_ n(n — 1 )(n — 2) 
3! 


M(n-3) X 8 + * • • + (— l)”X n . 


We prove the following results: 

Theorem. A necessary and sufficient condition that the function p(r) of (2.1) 
may he expressed by the absolutely convergent series 


(2.4) 
is that 

(2.5) 


, , e X X r , r d c _x X r ,Ud 2 <f x X r , 


+ I Mm I + ii I | 4* ii | mo) I + 


+ I M(n) | + 


converges where L n is defined as in (2.3). 

574 



CHARLIER TYPE B SERIES 


676 


3. Generating functions. For the function p(r) of (2.1) consider the gen¬ 
erating function defined by 

(3.1) <p(z) = £ z T p(r) 

rtmO 

where z is a complex variable. Because of (2.1) it is clear that the right member 
of (3.1) is uniformly and absolutely convergent for | z | ^ 1 so that the radius 
of convergence of (3.1) is some value Ri > 1. 

The Taylor expansion of <p(z) about the point z - 1 is given by 

(3.2) v (z) = v {\) + (z- 1V(1) + *>"(1) + . •• 

where, as may be readily obtained from (3.1), 

(3.3) ^(l) = f: r<r - l)(r -2)• • • (r - n + l)p(r) - M(lt) . 

r—0 

If it is assumed that (2.5) converges, then 

(3.4) <p{z) = 1 + (z — l)/i(D + ^ 2 | ^ ^ (2) +•*• + — — ' j—- /i(n) + • • • 
is uniformly and absolutely convergent for | z — 1 | ^ 1. 


(4.1) 


4. Sufficiency. For arbitrary X let us set 

e -x(_i) ^ + m)(i _ 1) + M(1) (Z - 1 )* + ...^ 


= 1 + L,(2 - 1 ) + ^ (Z - 1 )* + 


where the right member, because of (3.4) is absolutely convergent for 
| z — 1 | 1. The coefficients on the right side of (4.1) are given by 

(4.2) L n = nu) — dX H —^~ /u(„-2>X* — ••• + (— l) n X n 

and the factorial moments may also be expressed by 


(4.3) 


M(n) 


= L n + nL„_] X + n ^ n ~ ^**-2 X* + • • • + X n . 


2 ! 


These relations are readily derived by expressing (4.1) symbolically as 
(4 4) + = e L ^~^ 

where after expansion n n and L w are to be replaced by M(») and L n respectively. 
(Cf. Jordan [7], p. 39). From (4.1) and (3.4) there is now derived 


(4.5) 


<p(z) 


e 




l) + g{C»- 1)*+ •••). 



576 


S. KULLBACK 


Since the right member of (4.5) is absolutely and uniformly convergent for 
| z — 1 | ^ 1 for arbitrary X, it may be expressed as 

(4.6) *(*) =(l + L,A + |5*- f + 

Since the radius of convergence of the right member of (4.6) is some value R 2 
such that | 2 — 1 | < 7? 2 > 1, it may be expressed as a power series about 2 = 0, or 




(4.7) <p{z) 


-( 


1 , j d lj?d* 

1 + Ll 3\ + 2 ! Sk* + 


M 


1 + X* + + 


■). 


Recalling now the definition of <p(z) as given in (3.1), there is obtained by equat¬ 
ing coefficients of like powers of 2 in (3.1) and (4.7) 

/ ^ r.. ,'» 2 

(4.8) 


P (r) = (1 + ^ + '' 


\ e x X r 
/ r! ‘ 


Since it may l>e readily shown that 

d n e~ x X r 


(4.9) 

where 

and 


—X -y r 

_= (■— l) n A n C —— 

5X n r! v ; r! 


6^ A X r 


e~ x X r 


c“ x X r_1 


r! (r — 1)! 


—X x r — X A r 

C X _ ^ n - i C X 


r! 


r! 


(r—1)! 


we may also write (4.8) as 


e~~ x X r 


—X \ r 

, e X 


Lz A g e x X r 


(*•10) PM - '-7T - '- 4 TT + ^f 4 ’Tr - ST 4 ‘ti" + 


5. Necessity. Assume that the function p(r) of (2.1), for arbitrary X, is 
given by the absolutely convergent series 


(5.1) 


p(r) = ^ 


d , Lt d 

1 +/ ' ,1 dX + 2!3X’ + 


•)9' 


Since e x X r /rl is continuous with respect to X, there follows, where 2 is a complex 
variable and I 2 I <; 1 


2 r <f x X r 


£*>(»•) = Z , 

r—0 r-0 r! 


/ e~ x X r 


+ - Ll dX§ r! 


T 2! dX 2 ±o r! " ^ 


(5.2) 


= e x< - 


!> + Lj(z — 1) + (z — l) 2 + • • • ^ 


1 + Mi(z - 1) + (* - D s + jf(z - 1)’ + 



CHARL1ER TYPE B SERIES 


577 


where 

(5.3) M n = + nL„ . x X + L „_ 2 X s + • • • + X". 

From (5.2) it follows that 

(5.4) M n = M(n) 

where M(») is as defined in (3.3). Since (5.1) becomes for r = 0, X = 0 
(5-5) 1 — mu) 4- 2 j ^< 2 ) “ 3 ', M(3) + • • • 

the assumed absolute convergence implies that 

(5.G) 1 + | [X (l) | + 2J I MW I + 3J I M(3) I + * • • + ~, I M(n) I + • • • 

converges. 


6. Remarks. ObreehkofT [11 ] shows that his result includes those of Pollaczek- 
Geiringer [12], Szego [12] (p. 110) and Jacob [5]. His theorem states that if 
the function p(r), (r = 0, 1, 2, • • ■)» satisfies the following conditions 

(0.1) Z2V | p(r) | 

Tmr 1 


is convergent for each finite number A, and 


( 6 . 2 ) 


_(4X) n y> I p(r)J 

(« + l)l£j r 


(e'x'/rir 1 


tends toward zero as n increases indefinitely then p(r) may be expressed in a 
convergent Charlier Type B series. 

Uspensky [18] shows that if 


(0.3) 


23 * r p(r) 

r-0 


has a radius of convergence R > 2 then p(r) may be expressed in a convergent 
Charlier Type B series. 

Schmidt [10] shows that a necessary and sufficient condition for the convergence 
is that the function <p(z ) defined as in (3.1) (he does not explicitly impose the 
condition (2.1) on p(r)) be regular inside the two circles | z | < 1 and | 2 — 11 < 1 
and with all its derivatives is continuous on the peripheries also. In the case 
that p(r) 0, the condition (2.5) is stronger, in fact in this case Schmidt [16] 
shows that a necessary and sufficient condition is that 

lim p(r)2 r r k — 0 



578 


S. KULLBACK 


for all integral k ^ 0. If p(r) ^ 0, then Uspensky’s condition is only just 
enough stronger than Schmidt’s to keep it from being sufficient. 

If (6.1) is satisfied, or if (6.3) is satisfied then (3.1) is absolutely convergent 
for | z | | 2. Therefore, the point z = 2 is contained in the circle of convergence 
of (3.2) or (3.4) which implies that 

1 + | Md) | + 2\ I M(2) I + * ’ • + ni I Mdrt I + • • • 

converges. 

It is deemed worthy of special mention to point out, as both Schmidt and 
Uspensky have done, the striking fact that the necessary and sufficient condition 
for the validity of (2.4) is independent of X. This arbitrariness of X enables us 
to dispose of it so as to obtain better convergence. Indeed if we set X = /x ( i> 
then as is evident from (4.2) Li = 0. 


7. Special cases. It is of interest to note that (4.8) is the Taylor expansion 
if p(r) = rV/r!, (r = 0, 1, 2, • • • ), for then (4.2) becomes 

(7.1) L n = (m - X) n 


since for the Poisson Exponential Distribution e VV**!, (r = 0, 1, 2, • • •)> 
Mc«) = m” and (4.8) is then 


(7.2) 


e V 


r! 


<r x x r , , ^ a <r x x r , ( M - x ) 2 a 2 e~ x x r , 
"7T + (m ~ x) dxTT + ~ 2 \ av i_ Tr + 


If p(r) is finite, that is if p(r) = 0 for r > n + 1 then n {k) = 0 for k > n + 1. 
Thus, for a finite function the condition (2.5) is satisfied. 


8 . Factorial moments. For functions p(r), (r = 0,1, 2, • • •), satisfying (2.5), 
there may be derived from (3.1) and (3.4) the relation 

(8.1) rlp(r) = /i (r , - M(H« + ^jM(r+!) - ijMcr+s) + ••• , (r = 0, 1, 2, • • •), 

since each side is ^> (r) (0) derived respectively from (3.1) and (3.4). It should 
be noted that for X = 0 (4.5) leads to (8.1) rather than (4.8) so that (8.1) may 
be considered as the Charlier Type B series for X = 0. The result (8.1) was 
derived for finite functions by Steffensen [17]. (Also compare Kaplansky [8]). 
This may also be expressed symbolically by 

(8.2) p(r) = mVV**!, (r = 0, 1, 2, • • •), 

where after expansion p n is to be replaced by M<n) . It is of interest to note the 
relation between the symbolic expression for p{r) as a Poisson Exponential in 

(8.2) and the series (4.8)*, for (4.8) may be expressed symbolically as 

P(r) = e i(4,4X) ■ e ^A r = e^ (x+t) (A + L)'/r\ 

( 8 . 3 ) rl 

= /i r c"7H 

since e aid,dx) f(x) — f(x + a) and the relations (4.2), (4.3), (4.4). 



CHARLIER TYPE B SERIES 


579 


9. Illustrations. Consider the function 

(9.D p(r) = l/2 r+1 , (r = 0, 1, 2, • • •)• 

For this function 

(9.2) *>(*) = E/p(r) = 1/(2 - 2 ) 

r—0 

and 

(9.3) ^"’(l) = M(n> = n! 
so that (2.5) becomes 

(9.4) 1 + 1 + 1 + • • • 

which does not converge. (It may be of interest to note that for this case 
(8.1) yields 

(9.5) p(0) = 1 — 1 + 1— 1 + 1 — .... 

The series on the right in (9.5) is not convergent but is summable C, to §. For 
the latter see for example R. P. Agnew, [19].) In this case the first several co¬ 
efficients of (4.8) are for X = 1, 

U =0, b = .5000, g? = .3333, b = .3750 

(9.0) 

b = .3667, L \ = .3681, b = .3679, 

Let us now consider the function 

(9.7) p(0) = i, p(r) = V 3 r , (r-1,2, •••)• 

For this function 

(9.8) *>(«) = Ze'pW = | + ~ 

r—0 0 — 2 

and 

(9.9) * <B) (D = M(», = (« = 1, 2, •• • ), 

so that (2.5) becomes 


(9.10) 


‘ + 0)i + ©i + ©i + - 



580 


S. KULLBACK 


which converges. For this case (8.1) yields 


(9.11) 

s 

II 

I — 1 

1 

3\1 /3\1 /3\1 

,2/2 \2/2* \2/2* " T " 

• = i 

**- ©i 

\2/ 2 ! T 2!\2/ 2» 

• - i 

etc. 





In this case, the first several coefficients of (4.8) are for X = 0.75 

Li = 0, ~ = .093750, ^ = .046875, = .019043 

( 9 . 12 ) 

~ = .010840, = .005173, ^ = .002622, 


Let us now consider the function (suggested by Prof. C. Wexler) 


(9.13) p(0) = ^ 


> 1 ( 1 )'. 


p(r) = (— l) r 

•J ) 

For this function 

(9.14) Z)p(r) = 1, £ | p(r) | = 5 


1 , 2 , •••)• 


(9.15) 


<p(z) = Ez'pW = 5/(3 + 2z) 

r-0 


(9.16) (0 (B, (1) = M(n) = (- l) n n! (2/5)” 

In this case (2.5) becomes 


(9.17) 


* + S + G 


■ + (?y + 


which converges and (8.1) yields 


(9.18) 


p(0) = 1 + - b + 


(!)'+(!)• 


+ 


p(l) - -2/5 - 2!(2/5) 2 - |j (2/5) 3 


= 5/3 


5 

3 


2 

3 


etc. 

Note that for this case (6.1) or (6.3) are not satisfied. Using X = 1, it is 
found that 

(9.19) L t = -1.4, = 1.06, = -.5906, = .2779, ••• . 



CHARLIER TYPE B SERIES 


581 


REFERENCES 

[1] A. C. Aitkbn, Statistical Mathematics , London, 1939, pp. 68-59, pp. 66-67, pp. 76-79. 

[2] L. A. Aroian, “The type B Gram-Charlier series,” Annals of Math. Stat. f Vol. 8 (1937), 

p. 183. 

[3] C. V. L. Charlibr, “Uber die darstellung willktiriicher Funktionen,” Arkiv. f. Math. 

Astron. o. Fysik , Vol. 2 (1906-6), pp. 1-33. 

[4] Arne Fisher, The Mathematical Theory of Probabilities , Macmillan, 1923, pp. 271-279. 

[5] M. Jacob, “tTber die Charlier’sche B-Reihe,” Skand. Aktuarie-tids., Vol. 15(1932), 

pp. 286-291. 

[6] Charles Jordan, “Sur la probability des ypreuves rypytyes, le thyordme de Bernoulli 

et son inversion,’ 1 Bull. Soc. Math, de France , Vol. 54 (1926), pp. 101-137. 

[7] Charles Jordan, Statistique Mathhmatique, Paris, 1927, p. 39. 

[8] I. Kaplansky, “The asymptotic distribution of runs of consecutive elements,” Annals 

of Math. Stat. t Vol. 16 (1945), pp. 200-203. 

19] M. Kendall, Advanced Theory of Statistics , Vol. 1, Griffin, 1943. pp. 154-156. 

[10] R. v. Mises, Wahrscheinlichkeitsrechnung, Wien. 1931, pp. 265-269. 

[11] N. Obrechkoff, “Sur la loi de Poisson, la &yrie de Charlier et les 5quations lin5aire& 

aux differences finies du premier ordre a coefficients constants,” ActualiMs Set. 
et Ind.f Vol. 740 (1938). 

[12] H. Pollaczek-Geiringer, “Die Charlier’sche Entwicklung willktiriicher Verteilun- 

gen,” Skand. Aktuarie-tids. , Vol. 11 (1928), pp. 98-111. 

[13] H. Pollaczek-Geiringer, “Ober die Poissonsche Verteilung und die Entwicklung 

willktiriicher Verteilungen,’’ Zeits.f. Angew. Math. t Vol. 8 (1928), pp. 292-309. 

[14] H. L. Rietz, Mathematical Statistics , Chicago, 1927, pp. 60-68, pp. 170-172. 

P5] E. Schmidt, Sitzungsberichte der Preuss. Akad. d. Wiss., 1928, p. 148. 

[1£] E. Schmidt, “Ober die Charlier-Jordansche Entwicklung einer willktirlichen Funktion 
nach der Poissonschen Funktion und ihren Ableitungen,’’ Zeits. Angew . Math. 
Mech. y Vol. 13 (1933), pp. 139-142. 

[17] J. F. Steffensbn, “Factorial moments and discontinuous frequency functions,” 

Skand. Aktuarie-tids. , Vol. 6 (1923), pp. 73-89. 

[18] J. V. Uspensky, “On Charles Jordan’s series for probability,” Annals of Math ., Vol. 

32 (1931), pp. 306-312. 

[19] R. P. Agnew, “Summability of power series,” Amer. Math. Monthly t Vol. 53 (1946), 

pp. 251-259. 



NOTES 

This section is devoted to brief research expository articles on methodology 
and other short items. 

ON SMALL-SAMPLE ESTIMATION 

By George W. Brown 
Iowa State College 

1. Summary. This paper discusses some of the concepts underlying small 
sample estimation and reexamines, in particular, the current notions on “un¬ 
biased” estimation. Alternatives to the usual unbiased property are examined 
with respect to invariance under simultaneous one-to-one transformation of 
parameter and estimate; one of these alternatives, closely related to the maxi¬ 
mum likelihood method, seems to be new. The property of being unbiased in 
the likelihood sense is essentially equivalent to the statement that the estimate 
is a maximum likelihood estimate based on some distribution derived by inte¬ 
gration from the original sampling distribution, by virtue of a “hereditary” 
property of maximum likelihood estimation. 

An exposition of maximum likelihood estimation is given in terms of optimum 
pairwise selection with equal weights, providing a type of rationale for small 
sample estimation by maximum likelihood. 

2. Introduction. In large sample theory of estimation the problems are 
generally formulated in terms of a random variable x = (xi , x* , • • • , x n ) and a 
product distribution with, say, a density g(x\0) = /(a:i|^)/(x 2 1 - f(x n \0) 
where n is permitted to increase without limit. For small sample theory it is 
sufficient to consider an arbitrary distribution, not necessarily of product form, 
depending on a parameter 0. For convenience we will assume a distribution 
density of fixed form g(x\d), where x is in Euclidean n-space and 0 in Euclidean 
k‘ space, k <n. Granting at the outset that a complete rationale for estimation 
must be based on considerations like those of Wald [4,1939] dealing with specified 
risk functions, it is still a difficult process, in practice, to specify the risk functions 
and solve the ensuing mathematics problems. It may stilLbe to the point, then, 
to consider general properties that estimates might be required to have in order 
to be considered “acceptable”, or perhaps even “optimum”, over a class of 
“acceptable” estimates. 

In large-sample theory the situation is fairly simple. Consistent estimates 
have the property that the estimate converges in probability to the true param¬ 
eter value. “Best” or “optimum” estimates are defined in terms of the order 
of convergence, or asymptotic variance. All reasonable definitions of “optimum” 
become asymptotically equivalent, since they all measure essentially the rate of 

582 



SMALL-SAMPLE ESTIMATION 


583 


convergence, so that one might ask for least variance, or least expected absolute 
deviation, or least expected fcth power, without affecting the optimum estimate, 
in general. Moreover, the consistency property and the optimum properties 
are in general invariant under simultaneous one-to-one transformation of the 
parameter and its estimate, i.e., the square of an asymptotically optimum esti¬ 
mate of a will be an asymptotically optimum estimate of <r 2 . Finally, a general 
estimation method, the method of maximum likelihood, leads to optimum esti¬ 
mates in large samples. 

In small samples, on the other hand, the search for corresponding criteria has 
led to the investigation of best “unbiased” estimates, and the like, where few, 
if any, of the definitions discussed possess an invariance property under simul¬ 
taneous one-to-one transformation of the parameter and its estimate. 

3. Unbiased estimation. To ensure, in small-sample estimation, that an 
estimate bears some relation to the parameter it is estimating, it has become the 
custom to require that an estimate be unbiased, which means that the expected 
value of the estimate agrees with the parameter value. This condition was sug¬ 
gested by the consistency property which is required in large-sample estimation. 
It ensures, moreover, that the average of a large number of independent estimates 
made on the same basis will provide a consistent estimate, in the large sample 
sense. While this consistency property of the average may at times be conveni¬ 
ent in practical situations, the fact remains that the problem of estimation from 
a number of such observations is a different estimation problem, the “best” 
solution to which need not be the average of the “best” solutions of the original 
problem corresponding to estimation of Q from a single observation on x, where 
x has a density g{x\6). More to the point, however, is the objection that an 
unbiased estimate of a parameter does not in general transform into an unbiased 
estimate when both estimate and parameter are subjected to the same one-to-one 
transformation. Moreover, one can easily construct situations for which the 
only acceptable unbiased estimates are clearly inferior from almost any point 
of view, to estimates which are biased (Girshick, Mosteller and Savage, [1,1946], 
and Halmos [2,1946]). 

It may be of interest to consider a few reasonable alternatives to the lack of 
bias requirement, which seem to accomplish as much as the conventional defini¬ 
tion and which, in addition, have an invariance under one-to-one transformation 
of the parameter and estimate. To avoid confusion, let us attach the qualifying 
prefix “mean” to the usual unbiased property, so that an estimate will be said 
to be mean-unbiased if its expected value agrees with the parameter value. 

Consider as one alternative the following property. An estimate of a one- 
dimensional parameter 6 will be said to be median-unbiased, if for fixed $, the 
median of the distribution of the estimate is at the value Q, i.e., the estimate 
underestimates just as often as it overestimates. This requirement seems for 
most purposes to accomplish as much as the mean-unbiased requirement and 
has the additional property that it is invariant under one-to-one transformation. 



584 


GEORGE W. BROWN 


A different alternative requirement which is invariant under transformations 
is suggested by the definition of unbiased tests ofjjsignificance (Neyman and 
Pearson [3,1936]). Let us say that an estimate is likelihood-unbiased if h(d \ O') < 
h(0 | 0 ), where the estimate 0 has probability density h(6 1(0). Injother words, an 
estimation method is likelihood-unbiased if estimates in the neighborhood of a 
given parameter value 0 would occur more frequently when the true value is 
itself 0 than when it differs from 0. On intuitive grounds this seems to be an 
acceptable kind of requirement, applicable to a very general class of estimation 
problems. It is evident that the assumption of a density plays no important 
role here; the situation is analogous to the maximum likelihood situation. The 
property itself is invariant under simultaneous one-to-one transformations of 
parameter and estimate for the same reason that maximum likelihood estimates 
are invariant under such transformations, in fact one can readily see that the 
likelihood-unbiased condition is equivalent to requiring that 6 have such a 
distribution, as a function of 6 , that the maximum likelihood estimate of 0 
based on 0 will be actually equal to 0. The obvious implication of this fact is 
that if a function is given (possibly a sufficient statistic for 0) then there is 
an essentially unique likelihood-unbiased estimate 6 based on </>, obtained by 
finding the maximum likelihood estimate of 0 in the distribution of <f> as a function 
of 0. 

As an example, consider the estimation of a 2 from a sample of n observations 
from a normal distribution. Let S 2 be the usual sum of squares, where S 2 /<r 2 
is distributed like x J on n - 1 degrees of freedom. Then the only likelihood— 
unbiased estimate of a 2 based on S 2 is S 2 /(n — I). In this case S 2 /(n — 1) is 
also mean-unbiased, a fact which is normally quoted as justification for the 
division by n — 1. Curiously enough, it is customary to estimate a by 
yjS 2 / (n — 1), even though this is a biased estimate of v, according to the usual 
notion of ‘‘unbiased”, referred to here as “mean-unbiased”. On the other hand, 
\/~S 2 /(ri — I) is a perfectly good likelihood-unbiased estimate of a, by virtue 
of the invariance under transformations. It might be pointed out, in passing, 
that the estimate S 2 /(n — 1) does not have minimum mean square about a 2 , 
but that the optimum divisor for minimizing the mean square error about a 2 
isn + 1. 

The fact that a likelihood-unbiased estimate is the maximum likelihood esti¬ 
mate based on the distribution of the estimate itself suggest further examination 
of maximum likelihood estimates. If we define a simple estimate as one which 
completely determines a probability distribution for x , then we have as a theorem, 
the following: 

A simple maximum likelihood estimate 6{x) is likelihood-unbiased. What this 
means is essentially that maximum-likelihood is “hereditary”, i.e. if d(x) maxi¬ 
mizes g(x | 0) in a space of n dimensions, and 6 has a derived density h(b | 0) 
in a space of k < n dimensions, then 0 = b maximizes h(6 | 0 ). The proof follows 
readily from the fact that h(& | 6) is obtained by integration of g(x | 0) over all 
x such that 6{x) — 6. 



SMALL-SAMPLE ESTIMATION 


585 


The example of estimating a 2 , quoted above, shows that the word “simple” 
cannot be omitted from the statement above. For example, the simple estimate 
in the parent distribution is the joint estimate ( x , S 2 /n) of (m, <r 2 ) and in fact the 
joint estimate is likelihood-unbiased. On the other hand, &?/n is not a simple 
maximum likelihood estimate, and we observe that S?/n is not likelihood-un¬ 
biased. S 2 /(n — 1) is a simple maximum likelihood estimate of <r* based on 
the distribution of S 2 itself, so that S 2 /(n — 1) is, as a result, likelihood unbiased. 

One can exhibit situations in which the conventional mean-unbiased property 
is very unnatural, while the likelihood-unbiased property may be quite natural. 
Consider, for example, the case where a is to be estimated by use of a x 2 -dis- 
tributed S 2 with n — 1 degrees of freedom, but subject to the condition a 2 > a\ , 
where a 2 is known in advance. Then the estimate a 2 = max [S 2 /(n — 1), <r\] is 
certainly biased according to conventional definitions, but is nevertheless, likeli¬ 
hood unbiased. To get a mean-unbiased estimate when a is near to <r* is im¬ 
possible except by admitting estimates less than crj, which is clearly foolish if it is 
known that <r 2 > <r\ . 

It may be of interest to include a brief discussion of maximum likelihood esti¬ 
mation in terms of pairwise selection of alternatives, providing a sort of optimum 
property for maximum likelihood estimation in small samples, in addition to the 
likelihood-unbiased property. Consider a choice to be made between only two 
alternative values of 0, say 0 O and 0i, by dividing the sample space into two 
regions So and Si , such that 0 O is accepted when x falls in So and d\ is accepted 
when x falls in Si . Then 

P' 0 (So) + P. 0 (&) = P 9l (So) + P 9l (Si) = 1. 

P 9l (S 0 ) is the probability of making the error of accepting 0o when 0 = 0i and 
1 — P 9q (So) is the probability of making the error of accepting 0i when 0 = 0 O . 
If the two errors are weighted equally, it is evident that a “best” test will choose 
aS 0 so as to minimize P 9l (S 0 ) + 1 — P 9q {S 0 ). It is well known that So will 
minimize the indicated quantity if S 0 consists of all points x such that g{x | 0 O ) > 
g{x | 0i). Thus we may speak of the region So defined by g(x | 0 O ) > g(x | 0i) 
as an optimum equal risk acceptance region for 0 O against 0i. Now if we transfer 
our attention to the general estimation problem we see that the maximum 
likelihood estimate 6{x) is that value of 0 which would be accepted by the op¬ 
timum equal risk acceptance procedure against all other 0*8. 

REFERENCES 

[1] M. A. Girshick, Frederick Mosteller, and L. J. Savage, “Unbiased estimates for 

certain binomial sampling problems with applications,” Annals of Math. Stat. t 
Vol. 17 (1940), p. 13. 

[2] Paul R. Halmos, “The theory of unbiased estimation,” Annals of Math. Stat., Vol. 17 

(1946), p. 34. 

[3J J. Neyman and E. S. Pearson, “Unbiased critical regions of Type A and Type Ai”, 
Stat. Res. Mem., Vol. 1, p. 1. 

[41 A. Wald, “Contributions to the theory of statistical estimation and testing hypoth¬ 
eses”, Annals of Math. Stat., Vol. 10 (1939), p. 299. 



586 


ABRAHAM WALD 


A NOTE ON REGRESSION ANALYSIS 

By Abraham Wald 
Columbia University 

1. Introduction. In regression analysis a set of variables y y X \, • • • , x p 
is considered where y is called the dependent variable and Xi , • • • , x v are the 
independent variables. Let y a denote the ath observation on y and a\« the 
ath observation on i,, (i = 1, • • • , p\ a = 1, • • • , N). The observations r,« 
are treated as given constants, while the observations yi , • • • y y N are regarded 
as chance variables. The following two assumptions are usually made concern¬ 
ing the joint distribution of the variates yi , • • • , y N : 

(a) The variates y \, • • • , y N are normally and independently distributed with 
a common unknown variance &*. 

(b) The expected value of y a is equal to ftzi a + • • • + PpX pa where ft , • • • , 
ft are unknown constants. 

In some problems it seems reasonable to assume that the regression coefficients 
ft , • • • , ft are not constants, but chance variables. This leads to a different 
probability model for regression analysis and the object of this note is to discuss 
certain aspects of this model. In what follows in this note we shall make the 
following assumptions concerning the joint distribution of the chance variables 
Ui i * > ft > * > ft • 

Assumption 1. For given values of ft , • • • , ft the joint conditional prob¬ 
ability density function of yi , • • • , y n is given by 

1 r 1 * 

(1-1) (2t) n, *(t n I ~~2a* Xla ~~ * * * $ p Xpi 

Assumption 2 . The regression coefficients ft , • • • , ft are independently 
distributed. 

Assumption S. The regression coefficients ft , • • • , ft , (r < p), are normally 
distributed with zero means and a common variance <r ,s . 

cr' a 

The purpose of this note is to derive confidence limits for the ratio —. Such 

confidence limits have been derived by the author [1] for analysis of variance 
problems assuming that there are only main effects but no interactions. The 
regression problem treated in the present note is much more general and in¬ 
cludes all the analysis of variance problems with or. without interactions as 
special cases. 

It should be remarked that Assumptions 2 and 3 do not exclude the case where 
ft+i, • • • , ft are constants. 

o ' 2 

2. Derivation of confidence limits for the ratio —. Let ft , • • • , ft be the 

< T 1 

sample estimates of ft , • * • , ft obtained by the method of least squares. We 



REGRESSION ANALYSIS 


587 


shall denote the difference 6,- — ft by , (i = 1, • • • , p). It is known that for 
given values of ft , • • • , ft the conditional joint distribution of «i, • • • , c p is 
normal with zero means and variance-covariance matrix || ca || cr a where 

(2.D \M = || a.vir 1 

and 

N 

(2*2) (lij = XufCj a j 0) i — 1> * * * > p)* 

a-1 

Since the conditional distribution of ci, • • • , c p does not depend on the values 
of ft , • • • , ft , the unconditional distribution of €i, * • * , e p is the same as the 
conditional one, and the set of variates (ft , • • • , ft) is independently dis¬ 
tributed of the set (ei , • • • , e p ). From this and Assumptions 2 and 3 it follows 
that bi , • • • , b r have a joint normal distribution and that 


(2.3) 

and 

Eb< = 0 , 

(i =!,■■■ ,r) 

(2.4) 

Ebibj = (c tV + i < 7 *, 

(i,j = 1 . • • • , r) 

where 5 ,7 = 

0 for i 7 * j and = 1 for i = j. 

.0 


We shall denote — by X and the elements of the inverse 

<T l 

dijQO, i.e., 

of || cn + iijK || by 

(2.5) 

II II = II Cij + 5,-yX II \ 

(*> 3 - 1 »•••.>•). 


Then the quadratic form 

(2.6) QW-ilWWi 

has the x distribution with r degrees of freedom. 

It is known that for any given values of ft , • • • , ft, bi , * * • , b p the quadratic 
form 

(2.7) Qa = (l/o biXia ‘ * * bpXpa ) 

/t* _ 1 


has the x distribution with N — p degrees of freedom provided that the rank 
of the matrix || x ia || is p. Hence Q a and Q(X) are independently distributed 
and the ratio 


( 2 . 8 ) 


„ __N - p Q(\) 

1 - ~T- 07 


has the /’’-distribution with r and N — p degrees of freedom. 
Let Fi and F s be two values chosen so that 


( 2 . 9 ) 


Prob. |Fi g F ^ F s } = c 



588 


ABRAHAM WALD 


where c is a given positive constant less than 1. Then the set of all values X 
for which the inequality 

(2.10) Fi < - - ? < F t 

T (/a 

holds forms a confidence set for X with the confidence coefficient c. 

We shall now show that Q(\) is a monotonic function of X and, therefore, 
the confidence set determined by (2.10) is an interval. Let || (7*y || , (t, j = 
1, • • • , r), be an orthogonal matrix and let 

(2.11) b* = 

/-I 

It then follows from (2.3) and (2.4) that 


(2.12) 

E(b*) = 0, 

(*' = 1, • • • , r) 

and 



(2.13) 

E(b*b*) = (c*. + S,-,X)cr J , 

(i,j = 1, ••• ,r) 

where 



(2.14) 

* r r 

Cii = EE GikdiiCki . 

1-1 fc-1 


Let 



(2.15) 

1! d?/(X) || = || cjy + 5,*yX || \ 

(*> j =!,•••,»•) 

and put 

Q*(A) = d*(X)6*b*. 

a* 


It is easy to verify that Q*(X) is identically equal to Q(X). Hence, to prove 
the monotonicity of Q(X), it is sufficient to show that Q*(\) is a monotonic func¬ 
tion of X. Since no restrictions as to the choice of the orthogonal matrix || g<j || 
are made, we shall choose it so that the matrix || c?y || becomes diagonal, i.e., 
c* f = 0 for i ^ j, (• i , j = 1, • • • , r). Then 

(2.16) 

d*m = o 

for % j 


and 

(2.17) dSW-T—. 

Ca + X 

Hence 

(2.18) Q(A) = Q*(A) = 

(T a 4-1 Cii + X 

is a monotonically decreasing function of X. The confidence set determined by 
(2.10) is, therefore, an interval. 



DISTRIBUTION CURVES 


589 


The upper end point of the confidence interval is the root in X of the equation 

(2.19) QQ = Ft 

r Q. 

and the lower end point is the root in X of the equation 


( 2 . 20 ) 


N - p Q(X) = 
r Q a 


If equation (2.20) has no root, the lower end point of the confidence interval 
is put equal to zero. 


REFERENCE 

[1] A. Wald, “On the analysis of variance in case of multiple classifications with unequal 
class frequencies”, Annals, of Math Stat , Vol. 12 (1941). 


ON THE SHAPE OF THE ANGULAR CASE OF CAUCHY’S 
DISTRIBUTION CURVES 

By Aurel Wintner 

The Johns Hopkins University 

1. Let £ be a linear random variable, that is, a random variable capable of 
values x represented by points of a line — °o <x < oo, and suppose, for sim¬ 
plicity, that £ has a density of probability, f(x). Then, subject to provisos of 
convergence, the series 

F(x) = f(x + n) 

Ttmm— 00 

represents a periodic function, of period 1, having the following significance: 
F(x) is the density of probability of the angular random variable, say H, which 
is obtained if all the states 

, £ - 2, £ - 1, £, £+1, £ + 2, ... 

of the linear random variable are identified. 

In other words, if a circle of unit circumference rolls from — » to oo on the 
£-line, then every point of the circumference collects the various densities of 
probability attached to congruent points of the £-line, and a state of E repre¬ 
sents a point of the circumference. For a detailed study of the mapping £ —* E 
orcf. [2]. 

According to Poisson’s summation formula, the Fourier constants of the 
periodic function F(x) can be obtained by restricting u in g(u) to an equidistant 
sequence of discrete values, where g(u) denotes the Fourier transform of f(x ); 
cf., e.g., [5], p. 78 or [9], pp, 477-478. 



590 


AUREL WINTNER 


2. Consider, in particular, the case in which f(x) is the density of a symmetric 
distribution which is stable in Cauchy’s sense. The determination of the totality 
of these linear densities of probability is due to L4vy [6]. It was shown in [8] 
that every such f(x) = /(—s) is a decreasing function of | x |. As explained in 
[8], p. 70, this fact makes superfluous one of the axioms occurring in Gauss’ 
postulational approach to “errors of observation.” 

The purpose of the present note is the deduction of the angular analogue of 
the fact just quoted. The analogue states that, if f(x) is symmetric and stable, 
then the corresponding periodic F(x) is decreasing for 0 ^ x £ £ (and so, for 
reasons of symmetry, is increasing for \ ^ x ^ 1). This is contained in the 
italicized statement of §4 below. 

In view of Poisson’s rule, quoted above, the periodic densities in question can 
be defined by certain Fourier series representing generalizations of elliptic theta- 
series. From this point of view, not even the existence (i.e., the positivity) of 
.the periodic densities is obvious, if arbitrary values of the “precision constant” 
(denoted below by q) are allowed. The difficulties involved are explained in §3. 

3. If q and X are positive constants the first of which is less than 1, then the 
(even, periodic) function 

(1) h(x; q) = 1 + 2 2 s"’' cos nx, 

n—1 

where q nx > 0, has derivatives of arbitrarily high order at every real x. It is 
regular-analytic at every real x if and only if X > 0 is replaced by X ^ 1, where 
the sign of equality holds if and only if the analytic continuation (from the x-axis) 
is not an entire function. In fact, it is known that a Fourier series 
2(a» cos nx + b n sin nx) is that of a function which is regular-analytic at every 
real x , and has the period 2x, if and only if | a n | + | b n | is majorized by a con¬ 
stant multiple of the nth power of a positive constant which is less than 1; 
and that the latter constant can be chosen arbitrarily small if and only if the 
analytic continuation does not lead to any singularity (at a z °o). 

Since the function (1) tends to 1 uniformly in x as q —> +0, if X is fixed, there 
belongs to every X > 0 a positive q * = g*(X) having the property that 

(2) 6 \(x ; q) > 0 for 0 ^ x < 2r 

if 0 < q < g*(X). It is less obvious that, if q is sufficiently small with reference 
to X, say if 0 < q < g**(X), then 

(3) di(x ; q) is decreasing for 0 ^ x ^ v 

(hence, increasing for ir ^ x < 2tt). The existence of such a q**(\) < °° for 
every X > 0 can be assured as follows: 

If 8 n (x) denotes the nth partial sum of the Fourier series 2 (sin nx)/n, then 
s n (x) is positive for 0 < x < v (Gronwall, Jackson; for a short proof, cf. [4]). 



DISTRIBUTION CURVES 


591 


Hence, a partial summation shows that the sum of a sine series, Xb n sin nx y 
must be positive for 0 < x < w if 

nb n — (n + l)b n +i > 0 and nb n —* 0. 

Since the^first derivative of (1) (with respect to x) results by choosing 
b n = —2 nq n , it follows that (3) must be true if 

™y X - (n + l)V n+1)X > 0 

holds for n = 1, 2, • • • . But the last inequality is readily seen to be satisfied 
from n = 1 onward if, while X is fixed, q tends to 0. This proves that <?**(\) 
exists for every X > 0. 

4. From these deductions alone, it is quite unexpected that (the best values of) 
both q*(\) and q**(X) turn out to be independent of X when 

(4) 0 < X ^ 2, 

i.e., that (1) satisfies both (2) and (3) for 0 < q < 1, if (4) is assumed. This 
fact is of statistical significance, since, on the one hand, it is precisely the restric¬ 
tion (4) which is necessary and sufficient for the existence of Cauchy’s (sym¬ 
metric) “stable” distributions (cf. [6], pp. 254-263) and, on the other hand, 
the reduction (mod 2ir) of the densities of these linear distributions leads to 
the functions (1) as angular densities (cf. [9], pp. 477-478); the numerical value 
of g(< 1) being determined by the “precision” or “dispersion” of the resulting 
angular distributions. 

Under the necessary restriction (4), the linear analogue of q*(\) = 1 and of 
g**(X) = 1 was proved in [6], pp. 258-263 and in [8], pp. 71-77, respectively. 
It will remain undecided whether the restriction (4) is necessary in either of 
the angular cases. 

5. Suppose that X has a fixed value in the range (4). Then there exists a 
monotone function of t f say a\(t) , for which 

exp (— u x ) — / exp (—u 2 t) da\(t) 

Jo 

is an identity in u f where 0 < u < <*> (cf. [1], p. 769, where further references 
will be found). Hence, a change of variables shows that 

2 nk = f q ,n ' dcc(t | log 3 r sx ) 

Jo 

is an identity in q and n, where 0 < q < 1 and n = 0,1, 2, • • • (the integration 
variable is t). Consequently, from (1), 

0x(x; q) = f 6i(x; q‘) deed \ log q | 1 ~ ,/X ), 

Jo 



592 


AUKEL WINTNER 


where 0 < q < 1 and — oo < x < a>. In fact, the legitimacy of the term-by- 
term integration is obvious from 0 < q < 1 and da\ ^ 0 (even though the inte¬ 
grals are improper). 

6. Since ax is a non-decreasing function, it is clear from the last formula line 
that both (2) and (3) will be proved for 0 < q < 1 and for every X (satisfying (4)), 
if it is ascertained that both (2) and (3) hold for 0 < q < 1 when X = 2. But 
the case X = 2 of (1) is an elliptic theta-function, for which both properties in 
question (cf. the diagram in [3], p. 44) are known; a simple proof can be con¬ 
cluded from what, in Hecke’s terminology, is the Eulerian factorization of 
02Oc ; q) y as follows: 

According to Jacobi, the factorization of the case X = 2 of (1) is 
8 >(x - ft (1 “ + 2 9 2 '- 1 cos x + q in ~*) 

n—1 

(cf. [7], pp. 64-65). Thus 

6 ,.(x;q) = c t f[ P(x + v jg 1 ” -1 ), 

n-1 

where 

C, = n (l - q in ) 

n—1 

and 

(5) P(x ;r) = 1 — 2r cos x + r 8 , (0 < r < 1), 

hence 

P(x;r) > 0 (0 < r < 1). 

Since 0 < q < 1, this proves the case X = 2 of (2). Furthermore, logarithmic 
differentiation of the product representation of 0 2 (x ; q) gives 

9i(x ; q) = 8 i(x ; q) P'(x + r ; g ,n_I )/P(x + »; g ,n_1 ), 

n—1 

where/' = dj/dx\ so that, by (5), 

P f {x + t ; r) = —2r sin x. 

Since 0 < q < 1, the last.three formula lines and the case X = 2 of (2) imply that 

0i(* ; q) < 0 if 0 < x < x, 

as claimed by the case X = 2 of (3). 

This completes the proof of the italicized assertion. 



SEQUENTIAL ANALYSIS 


593 


REFERENCES 

[1] P. Hartman and A. Wintner, “On the spherical approach to the normal distribution 

law,” Amer. Jour, of Math., Vol. 62 (1940), pp. 769-779. 

[2] E. K. Haviland, “On the distribution functions of the reciprocal of a function and of 

a function reduced mod 1,” Amer. Jour, of Math., Vol. 68 (1941), pp. 408-414. 

[3] E. Jahnkb and F. Emde, Tables of Functions with Formulae and Curves, 3d ed. (1938), 

reprinted by G. E. Stechert and Co., New York. 

[4] E. Landau, “Ueber eine trigonometrische Ungleichung,” Math. Zeit., Vol. 37 (1933), 

p. 36. 

[5] H. Lebesgue, Leqons sur les series trigonombtriques, Paris, 1905. 

[6] P. Lfivy, Calcul des probability, Paris, 1925. 

[7] H. Weber, Elliptische Funktionen und algebraische Zahlen , Braunschweig, 1891. 

[8] A. Wintner, “On a class of Fourier transforms,” Amer. Jour, of Math., Vol. 58 (1936), 

pp. 45-90 and p. 425. 

[9] F. Zernike, “Wahrscheinlichkeitsrechnung und mathematische Statistik,” Handbuch 

der Physik, Vol. 3 (1928), pp. 419-492. 


a[note on the fundamental identity of sequential analysis 

By G. E. Albert 

U. S . Naval Ordnance Plant, Indianapolis 

1. Introduction. Let {*,), (i = 1, 2, 3, • • •), be a sequence of real valued 
random variables identically distributed according to the cumulative distribution 
function F(z). Define the sums Z N = Zi + z 2 + • • • + z N for every positive 
integer N . Choose two positive constants a and b and define the random vari¬ 
able n as the smallest integer AT for which one of the inequalities Z N a or 
Zff S —6 holds. The notations P(u | F) and E(u | F) will denote the probability 
of u and its expectation respectively assuming that F is the distribution of the z,*. 

Wald [1] has established the results contained in the following lemmas. 

Lemma 1. If the variance of F(z) is positive, P(n < oo | jP) equals one. 

Lemma 2. If there exists a positive number 8 such that P(e * < 1 — 8 | F) > 0 
and P(e* >1 + 8\F)>0 and if the moment generating function <p(t) = E(e 9t | F) 
pxists for all real values of t , then <p{t) has one and only one minimum at some finite 
value t = to . Moreover , <p"(t) > 0 for all real values of t. 

It is the purpose of this note to establish the following extension of the validity 
of certain results given by Wald [1], [2]. 

Theorem. 1 Under the conditions of Lemma 2 the identity 

(1) E{e*'%{t)Y n I F\ = 1 

1 Wald’s results show (1) to be valid for all complex t in the domain over which | ?(01 £ 1 
and the validity of the differentiation clause for all real t in that domain. The import¬ 
ance of the present extension arises from the fact that, if E(x | F) p* 0, then 0 < *(0 < 1 
on a certain interval of the real axis. 



594 


G. E. ALBERT 


is valid and may be differentiated with respect to t under the expectation sign any 
number of times for all real valms of t. 

Proof. The notation to will be used consistently to denote the t value at which 
<p{t) has its minimum. 

The proof of the theorem follows Wald’s methods quite closely and certain 
of the results given in [1] and [2] will be used here without discussion. 

Consider first the validity of (1). For an arbitrary positive integer N let P N 
be the probability P(n ^ N | F) and let E N (u | F) and E*(u | F) denote the 
conditional expectations of u subject to the respective conditions n ^ N and 
n > N. Wald [1] has shown that for any finite real value of t 

( 2 ) P N E N {e z ' t [<p(t)r n | F} + (1 - P„)[<p(t)r N E* N [e ZNt | F] » 1 . 

Since lim PnE n { [(p(t)]~ n exp(Z n t )} is the left member of the identity (1), it suffices 

isr-oo 

to demonstrate that 

(3) lim (1 - \F\ =0 

JV-oo 

^or all real values of t. 

Since 1 — P N tends to zero with increasing N and the expected value E* 
involved in (3) is bounded independently of N for any fixed t, the only source of 
difficulty in proving (3) lies in the fact that <p(t) may be less than unity on an 
interval of the real axis. That difficulty is easily avoided by the following 
device. Define the function 

(4) G(x) = [^)r[V°dF(*). 

Obviously G{x) is a distribution function whose moment generating function 
exists for all real t. Its mean is zero and its variance is positive as will be 
seen from the equations E(x | G) = and E(x | G) - <p"(to)/<p(to). It 

follows that $(t) is never less than unity for real values of t 
Let 12 denote the space of all Z \, • • • , z N and let S2(n > N) be that subset of 12 
on which n > N. One has 

(1 - P»)[m~"EUe z,,t \F} 

[ e Zt,i dF{zi) ■ • • dF(z„) [ dG { Zl ).. -dGM 

_ jQ(n>N) __ JQ(n>A0 _ 

f e ZN ‘ dF(z ,) • • • dF(zs) [ dG( Zl ) • • • dG(z N ) 

Jq J o 

where s ~ t ~ U and Q N = P(n ^ N | G). By Lemma 1, 1 — Q N tends to 
zero as N is increased. Thus, since \p(s) ^ 1 for all real t and the expected value 
E*{e Xy8 1 G\ is bounded independently of N for a fixed t, the equation (3) holds 
for all real t. 



SEQUENTIAL ANALYSIS 


595 


The differentiability clause of the theorem requires the following modification 
of a very powerful theorem due to Charles Stein [3]. 

Lemma 3. Under the conditions of Lemma 2, if the minimum <p(Q of v(t) is 
less than unity, there exists a positive number U such that 

(5) E{ex p [nti — n log <p(tt)] | F] < oo. 

Proof. If G is the distribution of the , by Stein’s theorem there exists a 
positive number h such that E(e ntl | G) is finite. Let fl(n = N) denote the 
subset of il on which n = N. Then 

P(n~N\G) = f d(?( 2 i)* • -dG(z/f) 

- [<p(k)]- N [ e* z " dF(zi)- • -dF(z N ) 

J Q (n-JV) 

^ P(n = N | F) exp [minja/ 0 , — bt 0 } — N log <p(f 0 )]. 

It follows that 

E{ex p [nti — n log *?(/ 0 )] | F] ^ E{e ntl | G } exp [— min{afo , — bto}] 
and the lemma is proved. 

To continue with the theorem, Wald’s proof [2] suffices for the case in which 
<p(to) ^ 1. Attention will be given only to the case <p(to) < 1. As pointed out in 
section 2 of [2], the differentiability clause of the theorem will be established if 
it can be shown that for any finite interval I of the real axis and any pair of 
integers n and r 2 there exists a function D rirt {Z n , n) such that for all t in I 
one has 

(6) Dr xrt (Z n ,n)* I n r ‘ I 

and 

(7) E{D ri r t (Z n , n) \F\ < <*>. 

On referring to Wald’s proof and using the inequality —log <p{t) ^ —log <p{U) for 
all t in I, it is seen that there exists a constant C and a positive number U such 
that the function 

D ri r,(Z n , n ) = CV'Mfo)] - V"‘* + «“**'*) 

satisfies (6) for all t in I. To establish (7) use the inequalities (2.4) and (2.6) 
in Wald [2] to obtain 

E{D ri r t i w) | F) 

~C±P(n = N\ F) | F\ 

N -1 

g Cje 11 * l(ti) + e~ bt, l(,-li)\E[ exp[rilogn- nlogv(<o)]|F). 


( 8 ) 



596 


D. S. VILLARS 


That (7) is indeed satisfied now follows from (5) and the finiteness of the function 
l(t) since for a large enough integer M one has 

E Pin = N | F) exp [r, log N - N log <p(fo)] 

N-M 


£ Z Pin = N | F) exp [Nt, - N log y>(<o)l < «>. 

Nmmii 

Thus the expected value on the extreme right in (8) is finite. This completes 
the proof of the theorem. 


REFERENCES 

ll] A. Wald, “On cumulative sums of random variables,” Annals of Math. Stat., Vol. 15 
(1944), pp. 283-285. 

[2] A. Wald, “Differentiation under tlie expectation sign in the fundamental identity of 

sequential analysis/’ Annals of Math. Stat., Vol. 17 (1946), pp. 493-496. 

[3] Charles Stein, “A note on cumulative sums,” Annals of Math. Stat., Vol. 17 (1946), 

pp. 498-499. 


A SIGNIFICANCE TEST AND ESTIMATION IN THE CASE OF 
EXPONENTIAL REGRESSION 

By 13. S. VlLLARS 1 

United Stales Rubber Company , Passaic, N. J . 

1. Introduction. The principal problem under consideration in this note 
may be described as follows. Consider a variate, z , whose distribution for a 
given value of a fixed variate, t, is: 

(1.1) f(z 1 1 ) = 

where a , b, and k are real-valued parameters. The regression of z on t is exponen¬ 
tial, for it follows from (1.1) that the expected value of z , given t , is: 

(1.2) E(z\t) = a- be~ kt . 

On the basis of a random sample Qn(zi , t\; z 2 , h ; • • • ; z N , t N ) it is desired to 
test whether k = 0 or oo. The problem of “fitting” a curve, z = a — 
to the sample (t. e. of estimating a, b , and k from the sample) will also be treated. 
As an illustration of how the statistical problems described above arise in 


1 Present address, Jersey City Junior College, Jersey City, N. J. 



EXPONENTIAL REGRESSION 


597 


practice, let us consider a typical situation in industrial chemistry. Let the 
quantity, z , be a property of a latex and let the quantity, t, be time. Suppose, 
furthermore, that measurement of t is without error but that measurement of z 
is subject to error; let it be assumed that the observed value in a measurement of 
2 is a variate having a normal (Gaussian) distribution about the “true value,” 
E(z). On basis of N independent measurements, Zi, z* , • • • , z N of z at times, 
t \, k , • • • , t N , respectively, the experimenter may wish to test the hypothesis 
that k = 0 or ». If this hypothesis is true the suspected exponential relation 
between z and t does not hold; in this case E(z) is a constant (a — b f or a) and 
estimation of the constant from the data is quite straightforward. If the data 
conflict with the hypothesis that k = 0 or oo, the experimenter may wish to 
estimate the parameters, a, b, and k (i. “fit” the curve, z = a — be~ kt > to the 
data). 

The problems considered in this note will be treated only for the case where N 
is an even integer (> 6) and the times h , k , • • • , ts at which measurements of 
z are made are such that 

(1.3) ka ~ ka—\ = A, a constant, (a = 1, 2, ■ • ■ , n = AT/2). 

The odd time intervals, t z — k , k — U , etc. do not have to be equal. 

2. Test of the hypothesis that k = 0 or oo. The space, say 12, of admissible 
values of the parameters in (1.1) is: c 2 > 0, — oo < a, 6, k < + oo. Under the 
null hypothesis the admissible values of the parameters lie in a subspace of 12, 
say a?, specified as follows: a > 0, — «> < a, b < + oo, k = 0, or oo. 

Let yj = Z 2 a and Xj = z 2 o-i, (a = 1, ■ • • , n = N/ 2). From (1.1) and (1.3) 
it follows that the n pairs Xj , yj are normally and independently distributed with 
common variance, <r 2 , that x 3 - and yj are independent (j = 1, 2, • • • , n), and 
that 

(2.1) Vi = h + mm 

where vj = E(y 3 ) y m = E(xj ), h - a( 1 — e~ k *) 7 and m = e~ k *. The space, 
12', of admissible values of the parameters in the joint distribution of xj , yj , 
(J = 1, • • • , w), is: a > 0 , Vi ~ h + mm , - oo < h < + °o, — oo < m , v,* < 
+ oo; 0 < m < oo. The subspace of 12', say w', associated with the null hypoth¬ 
esis is: a > 0, Vj = m = c, where c = a — 6 or a according as A; = 0 or oo. 
In 12', the expected values of x and y lie on a line; in «/ they lie in a single point. 
It is clear that by transforming the original sample 0*(zi, t \, • * • , z N , ts) to a 
sample O n (zi, yi ; • • • ; x n , y n ) we have reduced the original problem to the 
familiar problem of linear regression in which there is “error in both variates”. 

The slope of the “line of best fit” to the sample points (x x , yi ; • • • ; x n , y n ) 
is [1]: 

(2.2) m = - 8m + V(S„ - S„Y + 4SWJASU 



D. S. VILLARS 


where 

Sxx = ]£ ( X 3 - z) 2 
1 
r 

= X (Xj - x)(yj - y) 

1 

Syy m E (2/,- - g) J 
1 

W 

z = X *y/» 

1 

n 

£ = X 2/A 

1 

(w is an estimate of m in (2.1)). Since m = e~ k£i (where k and A are real), it is 
intuitively clear that when m is non-positive the sample 0* does not conflict 
with the null hypothesis. The null hypothesis can be tested by means of the 
statistic [2, 144] 

/ 0 f' — ^ xx 2mS xv + m 2 Syy 

K } m 2 S X x — 2mS xy + S vv ’ 

The null hypothesis is rejected if m is positive and F' is large. Percentage points 
of the distribution of F' are given in [2, 146] for n = 3 (1) 15 (5) 30, 40, 60, 120 
and for significance levels, 0.001, .01, .05, .10, and .20. These significance 
levels, however, were computed for use in cases where the sign of m was irrele¬ 
vant. It happens that to test the null hypothesis under consideration in this 
problem at a significance level a we should use a critical value of F' (given in 
[2]) corresponding to a significance level 2 a. The reason for this is that when 
the null hypothesis is true the quantities m and F' arc independent and the 
probability that m is positive is |—thus the chance of rejecting the null hypoth¬ 
esis is $(2a) — a. 

3. Estimation of a, b , and k . If the data do not support the hypothesis that 
k = 0 or oo, the experimenter may wish to estimate o, 6, and k. General alter¬ 
native methods of estimating these parameters will now be considered. 

(1) Estimate o, 6, and k from 0* by the method of least squares; t.a., solve 
the simultaneous equations dS/da == 0, dS/db = 0, and dS/dk = 0 for a, 6, 
and k y where 

(3.1) S = E (*< - « + be~ kti )\ 

The value of k obtained by this method of estimation will not in general be the 
same as that computable from fa in (2.2) and used for the significance testing. 

(2) Estimate k by means of (2.2) and the relation m = e~* A , then substitute 
this estimate into S of (3.1) and estimate a and 6 by means of least squares. 



EXPONENTIAL REGRESSION 


599 


(3) Estimate & as in (2) and choose, as an estimate of a, the intercept of the 
“line of best fit” for 0» . Then substitute these estimates of a and k into (3.1) 
and estimate b by means of least squares. In this case the estimate of b comes 
out to be: 

(3.2) S = E e~ U <(& - ei ) /£ 

where d and & are the estimates of a and k. 

If the values, k , k , • • • , t N are such that t i+ 1 - U = A, (i = 1,2, • • • ,N — 1), 
the following estimation procedure might be used. 

(4) Let 

Vi « Zi+i 

(i = 1, 2,..., AT - 1), 

Xj = Zi 

and treat the (N - 1) pairs of values (xi, yi ; • • • ; , yAr-i) as a sample of 

size (N — 1). Using this sample, estimate k , a, and 6 in a manner similar to 
that in (2) or (3). It should be noted that this sample is not a random sample 
owing to the dependence among the (jV — 1) elements. 

The procedure in alternative (1) is very laborious and time-consuming. The 
procedure in (2) and (3) can be carried out quickly and easily. In (1) the 
method of least squares yields the same results as would be obtained from appli¬ 
cation of the method of maximum likelihood. Examples of estimation by proce¬ 
dures (3) and (4) are given in the next section. 

4. Example. The accompanying table lists experimentally observed values 
of a property of a latex obtained at biweekly intervals. Using the first, third, 
etc., quantities as xj and the remaining ones as y$ , the sums of squares and prod¬ 
ucts of deviations are found to be: 

S xx = .035510 x = 0.9195 

Ssy = .025645 

S w = .023414 g = .9365. 

Substituting these values in equation (2.2) and computing the other constants 
from equation (2.1) we get: m = 0.791596, a = 1.0009, and k = 0.1168. The 
F 1 ratio is (2.3) 17.03. Entering Table I of [2], we find that for eight point pairs 
a value of F f = 16.5 may be expected only one time in one hundred. On ex¬ 
cluding the possibility of negative values of m, this corresponds to the 0.5% 
significance level. The exponential relationship is thus concluded to be highly 
significant. 

Evaluation of b by equation (3.2), method 3, gives 0.2560, if all 16 values are 
used. The equation calculated from the data is thus: 

(4.1) z = 1.0009 - 0.2560 



600 


D. S. VILLARS 


The alternative procedure, method 4, would be to use all the Z{ points for the 
estimation of a and k. This leads to the following values of the computation 
quantities: 

16 

= £ x* - x\ t = 0.052374; x = 0.9223 

tf-1 

16 

Sxy — ^ X{Xi+\ = .036924 

*—i 
16 

Syy - X)** - z? = .035436; £ = .9381. 

«-i 

Note that the difference S w — S xx used in the formula for m cancels out all inter¬ 
vening squares between the first and last. 

S w - S xx = x\ - x\ t . 

TABLE I 


t 

weeks 

1 

Zi 

t 

weeks 

, | 

t 

weeks 

Zi 

l 

t 

weeks 

Zi 

i 

.776 

9 

.939 

17 

.942 

25 

.955 

3 

.852 

11 


19 

.938 

27 

.993 

5 


13 


21 

.979 

29 

.985 

7 

.869 

15 

.948 

23 

.975 

31 

1.013 


However, the data excluded thereby are in effect included in the new S . 

The final values obtained by the fourth procedure are: m = 0.796596, a = 
1.0000, and k = 0.1137. The writer does not know whether the peculiar trans¬ 
ference of data from S w — S xx to Sty characteristic of procedure 4 improves the 
accuracy of the fit or hurts it. It is his personal preference to use procedure 3. 

6. Acknowledgement. The writer wishes to acknowledge with thanks his 
gratitude to Drs. T. W. Anderson, Jr. and David F. Votaw, Jr. for many sug¬ 
gestions and discussions concerning this problem and for much help in clarifying 
the presentation of the concepts. 

REFERENCES 

[1] Charles H. Kummell, The Analyst (Des Moines), Vol. 6 (1879), pp. 97-105. 

[2] D. S. Villars and T. W. Anderson, Jr., “Some significance tests for normal bivariate 

distributions/’ Annals of Math. Stat. } Vol. 14 (1943), pp. 141-148. 












EFFICIENCY OF A t -TEST 


601 


ON THE POWER EFFICIENCY OF A t-TEST FORMED 
BY PAIRING SAMPLE VALUES 

By John E. Walsh 
Princeton University 

1. Introduction. Consider two equal sized samples, one from a normal popu¬ 
lation with mean y and the other from a normal population with mean v . Let 
?i, • • • , g A be the sample values from the population with mean y and • • •, y n 
the values from the population with mean v . If the two populations have the 
same variance and the two samples are independent, the most powerful tests 
for comparing y and v using these samples (one-sided and symmetrical two- 
sided) are based on the statistic 

, _ [x — y - (y v)]y/n(n - 1) 

h « y— - =============z , 

y (xi - £)* + £ (y< - 9) a 

which has a Student ^-distribution with 2n — 2 degrees of freedom. Tests based 
on U also have the desirable property of being invariant under permutation of 
the data in each sample. 

Sometimes, however, it is useful to combine the sample values in the form 

= fa - 2/<), (i = 1, ••• ,n). 

Examples: 

(a) . When the samples are independent but it is not known that the two popu¬ 
lations have the same variance (Behrens-Fisher problem). 

(b) . When there may be correlation between Xi and y if (i = 1, ••• , n), 
this correlation being the same for each value of t (i.e. Xi is independent of yj 
if i j while each pair x <, y *, (t = 1, • • • , n), has the same normal bivariate 
distribution). 

In both (a) and (b) the z, are independently normally distributed with the 
same variance and mean y — v. 

The Student <-test for comparing y and v using the z,* is based on the statistic 
h = ~ (m Z y)lVn(n - 1) _ — y — (/x — v)] Vn(n — 1) 

which has a Student ^-distribution with n — 1 degrees of freedom. These tests 
are not invariant under permutation of the data in each sample. 

If it is true that all the sample values are independently distributed with the 
same variance <r’, efficiency will be lost by using the test based on t\ instead of 
the most powerful test based on t% . The purpose of this note is to determine the 
power efficiency of the tests based on h as compared with the corresponding 
tests based on t% for this case. 




TABLE I 

Power Function Values for the fa and fa Tests 













EFFICIENCY OF A tr TEST 


603 


Consideration is limited to one-sided tests, which is not a serious limitation 
since any two-sided test can be considered as a combination of two one-sided 
tests. Table II contains approximate power efficiencies of one-sided tests for 
n > 4 at the significance levels a = .05, .025, .01. 

It is found that the efficiency of the ti test increases with the sample size but 
is high even for small size samples. 

2. Outline of computations. The method of obtaining power efficiencies 
used here will be that outlined in [1]. Essentially this consists in computing the 
power function for the test based on h and then adjusting the sample size for 
the corresponding test based on U until its power function is approximately the 
same as for the h test. The ratio of the sample size (perhaps fractional) of the 
adjusted U. test to that of the t\ test is called the power efficiency of the h test. 
Intuitively this efficiency measures the fraction of the total available information 
which is being used when the h test is applied (since the U test is most powerful) 

TABLE II 


Approximate Power Efficiencies for Given n and a 



4 

5 

6 . 

7 

8 

9 

10 

15 

25 

OO 

.05 

82.5% 

85% 

87% 

88.5% 

90% 

91% 

92% 

95.5% 

98% 

100% 

.025 

77%* 

80%* 

82.5% 

84.5% 

86.5% 

88.5% 

90% 

93% 

96% 

100% 

.01 

73% 

75.5% 

78% 

80% 

82% 

84% 

85.5% 

90% 

94.5% 

100% 


* These values were obtained by comparison with the corresponding values for 
cl = .05 and .01. 


It is easily seen from symmetry that a one-sided h test of /x < v has the same 
power efficiency as the corresponding one-sided h test of p > v. Thus it is 
sufficient to consider the one-sided tests of n > v. 

The power function is found as a function of the parameter 5, where 



Most of the approximate power efficiencies were determined by using the 
normal approximation given in [2] to compute the power function values. This 
approximation was used for fractional values of n. Table I contains the results 
of these computations for one-sided tests of p > v. 

ExacJ values of the power function for integral values of n and a = .05, .01 
can be found frpm the tables in [3]. A comparison of the power function values 
obtained from the normal approximation with these exact values shows that, 
for n < 6, a = .01 and n < 4, a = .05, .025, the approximation underestimates 
the true values for small 8 and overestimates for large 8 . Although this combina¬ 
tion of underestimation and overestimation tends to cancel out in the determina- 



604 


MAURICE H. BELZ 


tion of power efficiencies, so that little error in power efficiencies would be 
expected if the approximation were used for n = 6, a = .01 or n = 4, a = .05, 
the efficiencies given in Table II for n = 4, a = .05 and n = 4, 6, a = .01 were 
obtained from the exact values by graphical interpolation and cross-interpolation. 

Power efficiencies were not considered for n <4 because of the difficulties 
of interpolation and the inexactness of the normal approximation in this range. 

For n = oo, t\ and U both have a normal distribution with zero mean and unit 
variance. Thus the power efficiency is 100% at all significance levels for 
this case. 

These computations furnish approximate power efficiencies for n = 6, 8, 10, 
15, 25, oo at a = .05, .025, .01, and for ft = 4 at a = .05 and .01. The other 
approximate power efficiencies listed in Table II were obtained by graphical 
interpolation from these values. 

The results of this note can be roughly summarized for ft < 15 by stating 
that of the 2 n sample values 

(i) . approximately 1.6 values are lost at the 5% significance level, 

(ii) . approximately 2.1 values are lost at the 2.5% significance level, 

(iii) . approximately 2.8 values are lost at the 1% significance level, if the 
tests based on t\ are used instead of the corresponding tests based on h . Exami¬ 
nation of Table I shows that the number of sample values lost decreases as n 
increases for n > 15. 

REFERENCES 

[1] John E. Walsh, “On the power function of the sign test for slippage of means”, 

Annals of Math. Stat., Vol. 17 (1946), pp. 360, 361. 

[2] N. L. Johnson and B. L. Welch, “Applications of the non-central t-distribution”, 

Biometnka , Vol. 31 (1940), p. 376. 

[31 J. Neyman, “Statistical problems in agricultural experimentation”, Roy. Stat. Soc. 
Suppl. , Vol. 2 (1935), pp. 131, 132. 

NOTE ON THE LIAPOUNOFF INEQUALITY FOR ABSOLUTE MOMENTS 

By Maurice H. Belz 
The University of Melbourne 

For a variate x measured from the mean of the population, the absolute 
moment of order r is defined by 

V, = f | x | r dF(x), 

J — 00 

where F(x) is the cumulative distribution function. Treating r as continuous, 
we have 

Sr ” £„ i x r i° & i x i 

the integral on the right existing if v r +\ exists. 



LIAPOUNOFF INEQUALITY 


605 


Write y = log, v r . Then we have 


” r Sr “ L» ' * ^ log * I * I 

”* cP = JI x | r dF ^' J_ K I x | r lo S* I x I — |£ w I * r log. I * I dF(x) | 


^ 0, by Schwarz’s inequality. 



It follows that the function y is convex (or exceptionally a straight line), and, 
on referring to the figure, it appears that 

(1) MQ g MQ f 

for all chords PR. If the abscissae of the points L, M, N are c, 6, o, respectively, 
where c g 5 g a, the inequality (1) leads at once to the relation 

i ^ a — b , , b — c , 

log. ^ -log, v e -\ -log, . 

- C 


Hence 

^ a—5 b—6 
Vb Ve Va , 

which is the usual form of the Liapounoff Inequality. 

REMARK ON THE NOTE “A GENERALIZATION OF 
WARING’S FORMULA” 

By T. N. E. Greville 

U. S. Public Health Service 

Before submitting for publication the note “A generalization of Waring’s 
formula,” Annals of Math . Stat Vol. 15 (1944), pp. 218-219 the author made a 
diligent effort to ascertain, through correspondence with mathematicians and 



606 


T. N. E. GREVILLE 


actuaries both in this country and abroad, whether the generalized formula in 
question had been previously published, and none of the authorities communi¬ 
cated with knew of its prior publication. However, it has now come to his 
attention that the formula was published in essentially the same form by Hermite 
in the article “Sur la formule d’interpolation de Lagrange”, Journal fur die 
Reine und Angewandte Mathematik (“Crelle’s Journal”), Vol. 84 (1878), 
pp. 70-79. 



ABSTRACTS OF PAPERS 

Presented Sept. 2-4,1947, at the Yale meeting of the Institute 

1. Estimation of Parameters in Truncated Pearson Frequency Distributions. 

A. C. Cohen, University of Georgia. 

Given a truncated univariate Pearson frequency distribution, parameters of the com¬ 
plete distribution are required. Karl Pearson and Alice Lee, ( Biometrika , Vol. 6 (1915), 
pp. 59-69) and R. A. Fisher, (Introduction to Mathematical Tables, Vol. 1, British Assn. 
Adv. Sci., 1931, pp. xxvi-xxxv), obtained solutions of the truncated normal distribution 
with a single tail missing. The present paper presents three general methods of solution 
applicable to any of the Pearson distributions. The first utilizes moments of a higher order 
than are required to characterize corresponding complete distributions. The order of 
the highest moment required is increased by one for each missing tail. The second method, 
applicable when only a single tail is missing, utilizes the terminal ordinate at the point of 
truncation and moments of the same order as required to characterize the complete dis¬ 
tribution. The terminal ordinate is evaluated by successive approximations. The third 
method utilizes only the first two moments, but requires that the given distribution be 
further truncated and that moments be computed both before and after the additional 
truncations. This latter method can also be applied to complete distributions to avoid 
direct computation of third and fourth order moments. 

2. Distribution of a Root of Determinantal Equation. D. N. Nanda, University 

of North Carolina. 

The joint distribution of the roots of a determinantal equation was given by P. L. Hsu 
in 1939 and the distribution of any one of the roots was studied by S. N. Roy. The present 
paper, however, gives a different method of working out the distribution of any root, 
specified by its place in a monotonic arrangement. This method enables us to express the 
distribution of a root of a certain determinantal equation in terms of a linear combination 
of products of incomplete beta integrals and in terms of the distribution of a root of lower- 
order determinantal equations. 

3. The Power of Certain Non-Parametric Tests of Independence. Wassily 
Hoeffding, University of North Carolina. 

Several tests of independence have been proposed which are based on statistics depending 
only on the ranks of the sample values. Under the hypothesis Ho of independence the 
distribution of such statistics does not depend on the form of the parent distribution. 
Two of these statistics, Spearman’s rank correlation coefficient and Lindeberg-Kendall’s 
statistic based on the number of inversions in the permutation of the ranks, are shown to 
be asymptotically normally distributed in samples from any population (the limiting nor¬ 
mal distribution being singular in certain degenerate cases). The asymptotic distribution 
of these coefficients reveals that the corresponding tests of independence are inconsistent 
(in the sense that the probability of rejecting Ho does not necessarily tend to 1 if Ho is not 
true), and at least one of them is biased in the limit. It can be shown that at least for some 
sample sizes and some sizes of the critical region there do not exist unbiased testa of inde¬ 
pendence based on ranks. But there do exist rank tests of independence which are con¬ 
sistent, and hence unbiased in the limit. Examples of such tests are given. 

4. Some Significance Tests for the Mean Using the Sample Range and Midrange. 

John E. Walsh, Princeton University. 

607 



608 


ABSTRACTS OF PAPERS 


Consider a sample of size n, (2 < n < 10), drawn from a normal population with mean/*. 
Let x n be the largest value and X\ the smallest value of the sample. Significance tests are 
developed to compare n with a given hypothetical value no by use of the sample. These 
significance tests are based on the quantity D » [$(a;i + x«) — mo]/(x» — xi) ■■ [(sample 
midrange) — (hypothetical mean)]/(sample range). One-sided and symmetrical tests are 
considered. Values of D a such that Pr (D > D a | n «■ no) “a are computed for a « .05, .025, 
.01, .005. These values of D a can be used to obtain one-sided tests at the .05, .025, .01, .005 
significance levels and symmetrical tests at the .10, .05, .02, .01* significance levels. Effi¬ 
ciencies are computed for one-sided tests at the .05,and .01 significance levels. The effi¬ 
ciency is at least 90% for n < 6 at the .05 significance level and for n < 8 at the .01 level. 
The range-midrange test can be applied without computation through the use of an easily 
constructed graph. The application of a test requires only the plotting of the sample 
point ( xi , x H ) on this graph. 

5. Testing Compound Symmetry in a Normal Multivariate Distribution. David 
F. Votaw, Jr., Princeton University. 

Let F(X) be the d.f. of a J-order vector variate X{t > 3). Suppose the components of X 
are divided into mutually exclusive and exhaustive subsets. F(X) is said to be compound 
symmetric , for the given division of its variates into subsets, if it is invariant over all per¬ 
mutations of its variates within these subsets. F{X) is completely symmetric if the invari¬ 
ance holds over all permutations of its variates. If F(X) is normal and compound sym¬ 
metric, then within each subset of variates the means are equal, the variances are equal 
and the covariances are equal, and betweefi any two subsets of variates the covariances 
are equal. Testing hypotheses of compound or complete symmetry in a normal F(X) 
is of interest, for example, in studying psychological examinations and in medical research. 

In this paper likelihood ratio criteria are developed for testing various hypotheses 
involving compound symmetry in regard to a normal distribution and to k normal dis¬ 
tributions (k 2). Given that the corresponding null hypothesis is true, the moments 
of each criterion are obtained explicitly and the distribution of each criterion is identified 
as the product of independent beta variates (in the case of a single normal distribution, 
the distributions are given explicitly for f - 3,4, and 5 for certain divisions of the variates 
into subsets). In a previous paper Wilks has given results on a very thorough study of 
the sampling theory of likelihood ratio criteria for various hypotheses involving complete 
symmetry in regard to a normal distribution. 

6. Effects of Non-Normality at High Significance Levels. Harold Hotell¬ 
ing, University of North Carolina. 

The effects of non-normality in the underlying population on the probabilities of sig ¬ 
nificance by customary statistical tests are not well understood, in spite of numerous 
attacks, both mathematical and experimental, on the problem. Chung’s recent proof that 
.the distribution of the Student ratio t has in samples from an arbitrary population a dis¬ 
tribution approaching normality for large samples tends to confirm the common idea that 
non-normality makes little difference if only the sample is fairly large, but this holds 
only for a fixed range of values of t while the sample number N increases. The tail areas 
beyond a deviation which increases with N in certain ways often behave quite differently 
than in sampling from a normal population. If p is the probability that 1 1 1 > to in sam¬ 
ples of N from a normal population and p f is the corresponding probability for another 

population, it is shown that lim may be zero or infinite or may take any 

finite value, even when the non-normal distribution involved is of simple and realistic 
continuous forms. The conditions that this limit be unity are concerned only with the 
shoulders of the population histogram, and have nothing to do with its moments or ita 



ABSTRACTS OF PAPERS 


609 


behavior at infinity or at its mean. This suggests that caution should be used in applying 
familiar tests with high significance levels; that further calculations should be directed 
toward making this caution quantitatively definite; and that the use of sample moments 
or cumulants cannot lead to the most appropriate criterion of non-normality for this 
purpose. 

7. On the Problem of Similar Regions. E. L. Lehmann, University of Cali¬ 
fornia, Berkeley, and Henry Scheff£, University of California, Los Angeles. 

If X » (X, ••• , X„) is a set of random variables with a joint probability density 
depending on a set of parameters 0 =• (0i , • • • , 0m), and if T *» (Ti , • • * , T m ) is a set of 
sufficient statistics for 0, then Neyman {Phil. Trans. Roy. Soc. Jjondon, Vol. 236 (1937), 
pp. 333-380) has proved that a region to in the space of X is similar with respect to 0 if it 
has the following structure: The intersections w{t) of to with the surfaces T « t have the 
property that the conditional probability of the sample point X falling into to given that 
T « t does not depend on t. In the present paper a necessary and sufficient condition is 
found for the regions with the above structure to be the only similar regions. This con¬ 
dition is shown to be satisfied for a certain class K of probability densities which contains 
as special cases all densities for which the totality of similar regions has been previously 
determined. In particular the partial differential equations which Neyman {Annals of 
Math. Stat., Vol. 12 (1941), pp. 46-76) assumed were satisfied in his solution of the problem 
of similar regions are solved and it is shown that any density satisfying these equations 
belongs to the above class K. 

8. Fourth Degree Exponential Function. L. A. Aroian and Marguerite 
Darkow, Hunter College. 

It is shown that the fourth degree exponential function is supported by the Bernoulli 
probability function and the hypergeometric probability function as well as being the 
function for which the method of moments is the best method according to the criterion of 
maximum likelihood. In the general situation six moments, at most, are needed. The 
function is classified into two general groups depending on symmetry or asymmetry and 
each case is divided again into unimodal and bimodal distributions. Examples show that 
the function is very successful in graduating the main Pearson types and the Gram-Charlier 
Type A frequency function. Various generalizations of the exponential function are 
indicated. In addition to its wide generality, the greatest practical advantage of the new 
system is the simplicity of the numerical calculations. 


9. A General Weak Limit Theorem for Independent Distributions. P. L. Hsu, 
University of North Carolina. (Read by title.) 


For every positive integer n let there, be n distribution functions F n i(x ), F H t(x ), 
••• , F»n(x). Assume that lim*-*, Maxig^„{l - F n ,(x) 4 F ni {-x)) - 0. Let F(x) 


be the convolution F n i(x)*F nt {x)* ••• *F n n(x). Let yf/{t) 


r +fl0 

mit + J \fi itx — 


1 -itx/{l + s*)](l + *■)/*■ dG{z), with G(#)T and 0(«) - G(- ») < «. LetF(x)bethe 
(infinitely divisible) distribution law having exp $(t) as its characteristic function* 
In order to have lim»_ t0 F»(3) ■■ F(x) at every continuity point of F(x), it is necessary 
and sufficient that the following relations hold at every x > 0 such that zt.x are continuity 
points of 0{y): 


(I) limn-oo 



|v|>* 



|v I >* 


(d 4 * y’)/y*) dO{y) t 



610 


ABSTRACTS OP PAPERS 


(II) lira.-. j£lf y> dF ni M - ( f y dF.,{y))\ - [ (1 + y') dQ (y), 

?-1 |»|>* V |v| <* / J J \v\<* 

(III) lim n -*oo52 f ydF n j(y)~m+ f y dG(y) - f (1 /y) dO(y). 

j-i J \v\<* J |v|<* J |y|<» 

10. On the Maximum Partial Sums of Sequences of Independent Random 
Variables. K. L. Chung, Princeton University. 

The asymptotic behavior of the maximum partial sums of a sequence of independent 
random variables is studied in this paper. Two groups of new limit theorems are estab- 
lished under general conditions. The first group deals with theorems of the weak type. 
The limiting distribution of the maximum partial sums is obtained with an estimate of 
the remainder, thus improving a recent result of Erdos and Kac. Another estimate is 
obtained for a different domain of variation, which plays an essential role in the sequel. 
These results correspond to the sharper forms of the central limit theorem. In the second 
group, theorems of the strong type are obtained, giving precise lower bounds (in the sense 
of probability) for the maximum partial sums. These results form the exact counterpart 
to the general form of the law of the iterated logarithm, due to Feller, which give the pre¬ 
cise upper bounds. A summary of the main results and methods has appeared in Proc. 
Nat. Acad, of Sti., Vol. 33 (1947), pp. 132-136. 

11. Some Results on the Distribution of Quadratic Forms From Gaussian 
Stochastic Processes. (Preliminary report). Herman Rubin, Cowles 
Commission. 

If one considers the estimation of the parameters of a Gaussian stochastic process 
whose elements are continuous functions from the functional values over a finite interval, 
one often finds that certain parameters can be estimated exactly, and certain parameters 
can not. This result often depends on the distribution of quadratic functionals whose 
arguments are elements of the stochastic process under consideration. In this paper, it 
is shown that the elements of a certain class of quadratic functionals have distributions 
concentrated at a point, and that the elements of a different class do not; in this latter case, 
the characteristic function is computed. 

12. Some Significance Tests for the Median which are Valid under Very General 
Conditions. (Preliminary Report) John E. Walsh, Princeton University. 
(Read by title.) 

Consider n independent values drawn from populations necessarily satisfying only: 1) 
Each population has a unique median. 2) The median has the same value <p for each popu¬ 
lation. 3) Each population is symmetrical. 4) Each population is continuous. (It 
is to be emphasized that no two of the values are necessarily drawn from the same popula¬ 
tion.) Significance tests are derived for <p on the basis of l)-4). These significance tests 
are based on order statistics of certain combinations of order statistics, each combination 
being either a single order statistic of the n values or one-half the sum of two order statistics. 
The tests are invariant under permutation of the n values and reasonably efficient if the 
values represent a sample from a normal population. The significance levels are of the 
form r/2", (r «* 1, • • • , 2* — 1). Each value of r can be obtained for some one-sided signifi¬ 
cance test. Thus any significance level can be closely approximated if n is large. The 
major disadvantage of these tests is the limited number of suitable significance levels avail¬ 
able for small values of n. This disadvantage is partially eliminated by the development of 
tests which have a specified significance level if the values are a sample from a normal 



ABSTRACTS OF PAPERS 


611 


population and a significance level bounded near this specified value if only l)-4) necessarily 
hold. Results based on l)-4) are applied to several well known statistical problems: 
Tests are obtained for the mean on the basis of a large number of independent values from 
populations having the mean but little else in common. Also generalized results are ob¬ 
tained for the Behrens-Fisher problem, quality control, slippage tests, the sign test and 
cases where some of the n values are dependent. 

13. Loss of Information in £-tests with Unbalanced Samples. (Preliminary 
Report) John E. Walsh, Princeton University. (Read by title.) 

Consider two normal populations, the first with mean oi and variance <r}, the second with 
mean 02 and variance a\, while <ri/o- 2 has a known value C. If the hypothesis «i *= a* is to 
be tested by a J-test (one-sided or symmetrical) using it\ sample values from the first popu¬ 
lation and n 2 values from the second population (n 1 + n 2 = n, fixed), it is shown that this 
experiment is most powerful when n\/n% = <n/a 2 (integer considerations neglected). The 
f-tests satisfying this condition will be referred to as balanced f-tests. Thus information 
will be lost by not using a balanced experiment. A quantitative measure of the information 
lost by using given values of n x and n 2 is determined by the total sample size m, (mi + m 2 - 
m), of the balanced t-tost (same significance level) which has approximately the same power. 
Then n — m sample values are wasted by using (n t , n 2 ) rather than (mi , m 2 ), i.e. only 
100m/n% of the information obtainable per observation is used by {n x , n 2 ). A sym¬ 
metrical t-test with significance level 2 a has the same value of m as a one-sided f-test with 
significance level a. For one-sided £-tests with significance level a: m = \{b y/ B* — 8A), 
where B - 2 + A + K\/ 2, A - (C + 1) 2 U “ A 2 /2(n - 2)][C 2 /n 1 + 1/wJ- 1 , and K„ is the 
standardized normal deviate exceeded with probability a. This approximation to m is 
valid for m 5 if a *= .05, m > 6 if a = .025, in > 7 if a *= .01, m > 8 if a = .005. (A 
fractional value of m represents an interpolated measure of the sample size of the equivalent 
balanced experiment.) 

14. Some Theorems on the Bemoullian Multiplicative Process. T. E. Harris 
Princeton University. (Read by title.) 

A single entity may have j dcscendents with probability P, , O' = 0, 1, 2, • • •)• Each 
first generation entity has then the same procreative probabilities, etc. Let 

/(«) = Po + Pis + • • • 

If z n is the number of entities in the nth generation, it is known that P(z n ** j ) is given by 
the coefficient of a i in the nth iterate/[/••• (/)] = / n (s). Let Ez x * x, 1 < x < 00 . Con¬ 
ditions are given insuring that as n —► » the cumulative distribution of the variate z n /x n 
approaches a limit-function which is absolutely continuous except for a possible single 
jump. Let g{u) be the corresponding frequency function. If /(a) is a polynomial of degree 
&,letg « log* &/(log x k — 1). Otherwise, q = 1. Then g(u) • exp{w« +< } [is, is not] summable 
(0, *) according as c is [negative, positive]. Behavior of g(u) near u « 0 is also considered. 
Special cases are considered where g (u) ** constant • m a positive integer. Max- 

mum likelihood estimates for the parameters po, Pi , • • • , and x are obtained as functions 
of n successive values Zi , z %, • • • , z n . Consistency, in a certain sense, is proved. A 
specialized method is given for finding the moment-generating function of the variate N t 
the smallest value of n such that z n » 0. 



NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. George E. Albert has been appointed to an associate professorship at the 
University of Tennessee. 

Dr. T. W. Anderson, Jr. has been promoted to an assistant professorship in 
the Department of Mathematical Statistics at Columbia University. He is on 
leave the first half of the 1947-48 academic year at the Institute of Actuarial 
Mathematics and Mathematical Statistics, Stockholm University as a Guggen¬ 
heim Fellow. During the second half of the academic year he will be at Cam¬ 
bridge University. 

Associate Professor Max Astrachan has been promoted to a full professorship 
at Antioch College, Yellow Springs, Ohio. 

Associate Professor T. A. Bancroft, who has been at the University of Georgia, 
Athens, Georgia, is now with the Statistical Laboratory, Alabama Polytechnic 
Institute, Auburn, Alabama. 

Dr. M. S. Bartlett of Cambridge University has been appointed as Professor 
of Mathematical Statistics at the University of Manchester, Manchester, 
England. The position is a newly created one. Professor Bartlett indicates 
that this position is believed to be the first official professorship in mathematical 
statistics in England. 

Professor M. A. Brumbaugh has accepted a position with the Bristol Labora¬ 
tories Inc., Syracuse 1, New York. 

Dr. Donald A, Darling has been appointed Research Associate at Cornell 
University. 

Professor D. B. DeLury of the Virginia Polytechnic Institute has accepted a 
position with the Ontario Research Foundation, 43 Queens Park, Toronto 5, 
Canada. 

Professor Abel Gauthier of the University of Montreal has been appointed 
Head of the Institute of Mathematics and Assistant-Secretary of the Faculty of 
Science at that institution. 

Dr. Casper Goffman, former assistant professor in the Mathematics Depart¬ 
ment, University of Kentucky, is now in the Mathematics Department, Univer¬ 
sity of Oklahoma, Norman, Oklahoma. 

Mr. Philip Hardy has returned to the General Electric Company at Warren, 
Ohio after serving at Wright Field. 

Dr. Carl F. Kossack, who has been with the Navy Department in Washington, 
D. C. as an Air Intelligence Specialist, has accepted an associate professorship in 
the Department of Mathematics at Purdue University. 

Mr. Frank Jones Massey, Jr. is now teaching in the Department of Mathe¬ 
matics, University of Maryland, College Park, Maryland. 

612 



NEWS AND NOTICES 


013 


Dr. William Burton Michael, who has been Lecturer in Mathematics, Psy¬ 
chology and Educational Psychology at the University of Southern California, 
has now accepted an assistant professorship in the Department of Psychology, 
Princeton University. He is also a member of the Research Department, 
College Entrance Examination Board at Princeton. 

Mr. Bernard Ostle, a former teaching assistant, School of Business Adminis¬ 
tration, University of Minnesota, is now at Iowa State College, Ames, Iowa. 

Mr. Maurice H. Quenouille, who was formerly with the Rothamsted Experi¬ 
mental Station, Harpendon, Herts, England, has accepted the position of 
Lecturer in Statistics, Marischal College, University of Aberdeen, Scotland. 

Dr. James A. Rafferty left the Department of Pathology, University of 
Rochester in June and has been appointed Chief of the Department of Statistics, 
Air University, School of Aviation Medicine, Randolph Pield, Texas. 

Miss Mary Ann Savas has accepted a position with General Motors, Detroit, 
Michigan. 

Professor George J. Stigler, formerly with Brown University, is now teaching 
in the Department of Economics, Columbia University, New York, New York. 

Professor E. L. Welker has resigned an associate professorship in mathematics 
at the University of Illinois to become Associate in Mathematics in the Bureau of 
Medical Economic Research of the American Medical Association. 

Mr. Sol M. Wezelman, who completed his master's degree in actuarial science 
at the University of Michigan in June, has accepted a position as Assistant 
Actuary in the North Dakota State Department of Insurance, Bismarck. 

Dr. Bertram Yood has received his doctorate at Yale and is now on the staff 
at Cornell University. 

Mr. Earl lv. Yost, Jr. has accepted a position with the General Electric Co. at 
the Hanford Engineering Project, Richland, Washington. 

Professor James G. Smith, of Princeton University, died at Princeton on 
November 28,1940. 


Beginning with the October issue, the quarterly journal Mathematical Tables 
and Other Aids to Computation will publish a new feature section, “Automatic 
Computing Machinery,” designed to disseminate information and news on 
research and development in the field of high-speed automatic calculating 
machinery. Material should fall under the general headings of Bibliography, 
Technical Developments, Discussion (including correspondence), and News. 
Contributions to this section are invited and should be addressed to Dr. E. W. 
Cannon, Head of the Mathematics Group, Machine Development Laboratory, 
National Bureau of Standards, Washington, D. C. 


Institute of Numerical Analysis Established 
Plans have been completed for the establishment of one of the newest units of 
the National Bureau of Standards—the Institute of Numerical Analysis—at the 



614 


NEWS AND NOTICES 


University of California at Los Angeles, according to an announcement by Dr. 
Edward U. Condon, Director of the Bureau. 

One of the giant high-speed electronic computing machines, now under devel¬ 
opment by the Bureau of Standards, will be installed at the Institute when 
completed. Design specifications call for high memory capacity and auto¬ 
matically sequenced mathematical operations from start to finish at speeds 
attainable only with electronic equipment. 

The Institute has two primary functions. The first is research in applied 
mathematics aimed at developing methods of analysis which will extend the use 
of the high-speed electronic computers. The second is to act as a service group 
for Western industries, research institutions, and government agencies. The 
service function will include not only the use of the machines for problem solving 
but also assistance in the formulation of problems in applied mathematics of the 
more complex and novel types. Service operations are to be initiated immedi¬ 
ately, using the latest types of commercially available computing equipment. 

The decision to locate the Institute at the University of California at Los 
Angeles was made after a nation-wide survey by the National Bureau of Stand¬ 
ards. Centers in the East and Middle West were considered as well as the Far 
West, but Los Angeles, it was decided, offered the widest range of possibilities 
for an Institute of Numerical Analysis. Concentration of aircraft industries and 
the presence of several major scientific institutions were critical in the choice of 
Los Angeles. 


Election of Fellows 

The Board of Directors announced at the Yale Meeting the election of the 
following members as Fellows of the Institute: Theodore W. Anderson, Jr., 
Alexander C. Aitken, David II. Blackwell, Georges Darmois, Ragnar Frisch, 
Robert C. Geaiy, Frederick Mosteller, Gerhard Tintner, Charles P. Winsor and 
John Wishart. 


New Members 


The following persons have been elected to membership in the Institute 
(<June 1 to August 29, 1947) 

Baldwin, Helen Mildred, B.S. (Cornell) Research Associate in Statistics, Atomic Energy 
Project, 215 Avenue C, Rochester 5, N. Y. 

Blunk, Paul M., A.B. Teaching asst, and grad, student, Univ. of Calif., Box 682, Fair 
Oaks , Calif. 

Bowden, George Edwin, B.S. (Duke) Teaching asst., Math. Dept., White Hall, Cornell 
Univ., Ithaca, N. Y. * 

Bradley, Ralph Allan, M.A. (Queen's Univ.) Grad, student, Univ. North Carolina, Well¬ 
ington, Ontario, Canada. 

Burton, Kenneth John, Head of Statistics Section, British Employers’ Confederation, 16 
Rutherwyke Close, Ewell, Surrey, England. 

Carlson, Phillip G., Jr., A.M. (Columbia) 148 Cornell Street, Roslindale SI, Mass. 



NEWS AND NOTICES 


615 


Carol, Bernard, M.S.E. (Columbia) Graduate student at Columbia Univ., 16 West 96th 
Street, N. Y. 

Clark, Sidney B., B.A. (George Wash. Univ.) Statistician, Public Hoads Administration, 
2728 Porter St., N.W., Wash., 8, D. C. 

Danielson, Theresa, M.A. (Univ. of Ill.) Mathematician at Brookhaven National Labora¬ 
tory, 8512 Cambridge Ave., New York 63, N. Y. 

Dineen, Russell D. F., B.A. (Univ. of Delaware) Graduate student at Univ. of Delaware, 
1818 French Street, Wilmington, Delaware. 

Diver, M. L., M.E. (Purdue) Consulting Engineer, P.O. Box 1016, San Antonio 6, Texas. 

Erasmus, Josias C., M.S.E. (Univ. of Stellcnbasch, South Africa) Research Officer, 
Grootfontein College of Agriculture, Middelburg, C-P, South Africa. 

Gottlieb, Morris J., Ph.D. (Wash. Univ., St. Ixmis) Member of the Institute for Advanced 
Study, Washington University, St. Louis, Mo. 

Greenwood, Joseph Arthur, A.B. (Harvard) Student at Ilarvafd University, 66 Oxford 
Si., Cambridge 88, Mass. 

Gysbers, Jack C., M.A. (Univ. of Calif.) Teaching asst., Dept, of Math., Univ. of Calif., 
2029 Berkeley Way, Berkeley 4, Calif. 

Haskind, Mina, B S. (Brooklyn College) Student at Brooklyn College, 768 Eastern Park - 
way, Brooklyn 18, New York. 

Hauser, Dr. Philip M., Ph.D. (Univ. of Chicago) University of Chicago, Chicago 37, Ill. 

Hoyt, Cyril J., Ph.D. (Univ. of Minn.) Research Associate, Dept, of Education, Univer¬ 
sity of Chicago, Chicago, Ill. 

Kern, Enrique Roberto, First Assistant, Institute of Biometry, Univ. of Buenos Aires, 
Rivadavia 8854, Buenos Aires, Argentina. 

Mark, Abraham M., Ph.D. (Cornell) Mathematics Department, Univ. of Wisconsin, 
Madison, Wisconsin. 

Moss, George G., II, B.A. (St. John’s College, Annapolis) Actuarial Statistician, Metro¬ 
politan Life, 2771 Morris Ave., N. Y. 68, N. Y. 

Phillips, Bernard E., A.M. (Columbia) Box 147, Cathedral Station, New York 26, N. Y. 

Radvanyi, Laszlo, Ph.D. (Univ. of Heidelberg) Professor of Economics, National Univ. of 
Mexico, Donato Guerra 1, desp. 207, Mexico, D. F. 

Richardson, John M., Ph.D. (Cornell) Member of Technical Staff, Bell Telephone Lab¬ 
oratories, Inc., Murray Hill, New Jersey. 

Royston, Robert W., M.S. (Univ. of Mich.) Asst. Prof., Math. Dept., Wash. & Lee Univ., 
117 W. Washington St., Lexington, Virginia. 

Savas, Mary A., A.B. (Univ. of Mich.) Student at Univ. of Mich., 524 E. Second St., Mon¬ 
roe, Mich. 

Shepard, David H., A.B. (Univ. of Mich.) Research Analyst, Army Security Agency, 
505 Randolph Street, Falls Church, Virginia. 

Throdahl, Monte C., B.S. (Iowa State College) Research Chemist in Charge of Rubber 
Lab., Monsanto Chemical Co., Nitro, West Virginia. 

Uchytil, Jan, Doctor of Science, Chief of Production Control Dep. in Central Federation 
of Czech. Industry, Praha II, Prikopy 14, Czech. 

Vergara, Jose, Doctor of Engineering, (Madrid) Professor of Economies, Madrid, Chief 
of the Bureau of Statistics, Dept. of Agric., Madrid. 5262 S. Blackstone Ave., Chicago 
87, Illinois. 

Wei, Dfung-shu, Ph.D. (Univ. of Iowa) Prof, and Head of Math. Dept., St. John's Univ., 
Shanghai, 129 East 10th St., New York 8, N. Y. 

Wolfson, Jacob, B.A. (New York College) Statistician, Social Security Adm., 845 Bruns¬ 
wick Road , Essex, Maryland. 



REPORT ON THE NEW HAVEN MEETING OF THE INSTITUTE 

The Tenth Summer Meeting of the Institute of Mathematical Statistics was 
held at Yale University, New Haven, Connecticut, Tuesday, September 2 
through Thursday, September 4, 1947. The meeting was held in conjunction 
with the summer meetings of the American Mathematical Society and the 
Mathematical Association of America. The following 150 members of the 
Institute attended the meeting: 

C. B. Allendoerfer, It. L. Anderson, H. E. Arnold, L. A. Aroian, II.M. Bacon, J. L. Barnes, 
W. D. Baten, R. E. Bechhofer, A. A. Bennett, Joseph Berkson, D. II. Blackwell, C. I. Bliss, 
Colin Blyth, Jr., A. E. Brandt, G. M. Brown, It. H. Brown, O. P. Bruno, P. T. Bruyere, 
Mrs. P. T. Bruyere, J. H. Bushey, B. H. Camp, G. C. Campbell, Uttam Cliand, K. L. Chung, 
W. G. Cochran, A. C. Cohen, Jr., E. P. Coleman, T. F. Cope, G. M. Cox, C. C. Craig, E. L. 
Crow, H. B. Curry, G. B. Dantzig, M. D. Darkow, B. B. Day, Bernard Dimsdale, C. E. 
Dieulefait, C. W. Dunnett, Churchill Eisenhart, L. It. Elvebaok, M. W. Eudey, II. P. Evans, 
William Feller, C. D. Ferris, M. M. Flood, R. M. Foster, II. A. Freeman, J. E. Freund, H. P. 
Geiringcr, M. J. Gottlieb, J. Arthur Greenwood, Evelyn Groosman, F. E. Grubbs, H. T. 
Guard, P. It. Halmos, Max Ilalperin, M. II. Hansen, B. I. Hart, Mina Ilaskind, Wassily 
Hoeffding, R. II. Hoskins, Harold Hotelling, A. S. Householder, Jaroslav Janko, Irving 
Kaplansky, Leo Katz, Oscar Kempthorne, E. M. Kennedy, W. L. Kichline, C. J. Kirchen, 
L. F. Knudsen, II. S. Konijn, C. F. Kossack, Jack I^iderman, H. G. Landau, E. L. Lehmann, 
R. A. Leibler, Walter Leighton, Jr., F. C. Leone, Joseph Lev, Howard Levene, Julius Leib- 
lein, Arthur Linder, S. B. Littauer, E. D. Lowry, II. F. MacNeish, P. J. McCarthy, John 
Mandel, H. B. Mann, Sophie Marcuse, F. J. Massey, Margaret Merrell, E. B. Mode, M. E. 
Moore, Frederick Mosteller, D. N. Nanda, P. M. Neurath, M. C. Neurdenburg, G. E. Noe¬ 
ther, M. L. Norden, H. W. Norton, P. S. Olmstead, A. L. O’Toole, E. R. Ott, T. E. Oxtoby, 
Edward Paulson, M. P. Peisakoff, G. B. Price, J. A. Rafferty, L. J. Reed, C. J. Rees, P. R. 
Rider, John Riordan, H. E. Bobbins, Milton da Silva Rodrigues, A. C. Rosander, Ernest 
Rubin, Herman Rubin, Frank Saidel, M. M. Sandomire, Arthur Sard, Max Sasuly, F. E. 
Safterth waite, E. D. Schell, Jack Sherman, Rosedith Sitgreaves, Andrew Sobczyk, Milton 
Sobel, Herbert Solomon, Mortimer Spiegelman, Arthur Stein, Henry Teicher, R. M. Thrall, 
Gerhard Tintner, M. N. Torrey, J. W. Tukey, D. F. Votaw, Jr., Abraham Wald, H. M. 
Walker, J. E. Walsh, R. M. Walter, J. H. Watkins, Dzung-shu Wei, E. S. Weiss, S. S. Wilks, 
C. P. Winsor, H. O. Wold, Jacob Wolfowitz, C. A. Wright, Bertram Yood. 

The Tuesday afternoon session was devoted to a symposium on 2 x 2 tables 
with Professor Lowell J. Reed of Johns Hopkins University serving as chairman. 
Addresses were given on Tests of Significance by Dr. Churchill Eisenhart, Na¬ 
tional Bureau of Standards; Estimation by Dr. Charles P. Winsor, Johns Hopkins 
University and Non-Standard Cases by Dr. Joseph Berkson, Mayo Clinic. 
Discussants were Mr. William F. Taylor, Dr. Frederick Mosteller, Professor 
David H. Blackwell and Professor John W. Tukey. The attendance was 
approximately 130. 

The first Wednesday morning session was devoted to contributed papers. 
Professor John W. Tukey of Princeton University presided. The attendance 
was approximately 85. The following three papers were presented: 

1. Estimation of Parameters in Truncated Pearson Frequency Distributions. 

Professor A. C. Cohen, University of Georgia. 

616 



REPORT ON THE NEW HAVEN MEETING 


617 


2. Distribution of a Root of a Determinanial Equation. 

Mr. D. N. Nanda, University of North Carolina. 

3. The Power of Certain Non-Parametric Tests of Independence. 

Dr. Wassily Hoeffding, University of North Carolina. 

The second Wednesday morning session was held with Professor Will Feller, 
President of the Institute, presiding. Professor R. A. Fisher, University of 
Cambridge, gave the address under the title The Fitting of Gene Frequencies to 
Data for Genotypes. The attendance was approximately 160. 

The membership business meeting of the Institute was held at 9:15, Thursday 
morning, in 102 Chittenden Hall with President Feller presiding. The attend¬ 
ance was approximately 55. It was voted to make certain changes in the 
By-Laws and in particular to raise the due to $7.00 per year. (An exception is 
made for those living outside the Western Hemisphere.)*' Morris Hansen, 
Chairman of the Committee on Planning and Development, initiated a lively 
discussion with reference to desirable changes in the Constitution. 

On Thursday morning at 10:30, with President Feller presiding, Professor A. 
Wald of Columbia University presented the Henry Lewis Rietz Lecture on 
Sequential Estimation and Multi-Decisions. The attendance was approximately 
150. 

A joint session with the American Mathematical Society was held early 
Thursday afternoon at which Professor S. S. Wilks of Princeton University gave 
a lecture on Sampling Theory of Order Statistics. Professor Harold Hotelling of 
the University of North Carolina was the presiding officer. The attendance was 
approximately 300. 

This session was followed by another joint session with the American Mathe¬ 
matical Society which was devoted to contributed papers. Professor John W. 
Tukey presided at this session and the attendance was approximately 115. The 
following seven papers were presented: 

1. Some Significance Tests for the Mean Using the Sample Range and Midrange. 

Mr. John Walsh, Princeton University. 

2. Testing Compound Symmetry in a Normal Multivariate Distribution. 

Dr. David F. Votaw, Jr., Princeton University. 

3. Effects of Non-Normality at High Significance Levels. Professor Harold Hotelling, 

University of North Carolina. 

4. On the Problem of Similar Regions. 

Dr. Erich L. Lehmann, University of California, Berkeley and Professor Henry 

Scheffe, University of California at Los Angeles. 

5. The Fourth Degree Exponential Function. 

Dr. Leo A. Aroian and Professor Marguerite Darkow, Hunter College. 

6. On the Maximum Partial Sums of Sequences of Independent Distributions. 

Dr. K. L. Chung, Princeton University. 

7. Some Results on the Distribution of Quadratic Forms from Gaussian Stochastic 

Processes. 

Mr. Herman Kubin, Cowles Commission. 

The following four papers were presented by title: 

8. A General Weak Limit Theorem for Independent Distributions. 

Professor P. L. Hsu, University of North Carolina. 



618 


REPORT ON THE NEW HAVEN MEETING 


9. Some Significance Tests far the Median which are Valid under Very General Condi¬ 
tions (Preliminary Report). 

Mr. John E. Walsh, Princeton University. 

10. Loss of Information in t-tests with Unbalanced Samples (Preliminary Report). 
Mr. John E. Walsh, Princeton University. 

11. Some Theorems on the Bernoullian Multiplicative Process. 

Mr. T. E. Harris, Princeton University. 

Abstracts of all these papers appear elsewhere in this issue of the Annals . 

A beer party in honor of the foreign statisticians attending the meeting was 
held in the dining room of Saybrook College on Wednesday evening. A joint 
dinner with the American Mathematical Society and the Mathematical Associ¬ 
ation of America was held on Thursday evening. 

C. C. Craig, 

Acting Secretary . 












MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter¬ 
ature of the world , with full subject and author indices 

publication of this journal is sponsored by the American Mathe¬ 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics. London Mathematical Society, Edinburgh 
Mathematical Society, mion Matematica Argentina, and others. 

Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $13.00 per year. 

Send subscription order or request for sample copy to 

AMERICAN MATHEMATICAL SOCIETY 
531 West 116th Street, New York City 27 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 

SEPTEMBER, 1947 
Articles 

On a Population Sample for Greece .. . Raymond J. Jessen, Richard H. Blythe, Jr., 
Oscar Kempthorne and W. Edwards Deming 
Estimating the Resident Alien Population of the United States.. E. P. Hutchinson and 
Ernest Rubin 

On the Use of Soviet Statistics. Harry Schwartz 

A Frequency Distribution Represented as the Sum of Two Poisson Distributions. .Walter 
Schilling, M.D. 

The Relation of Control Charts to Analysis of Variance and Chi-Square Tests Henry Scheff£ 

Problems in Providing Adequate Statistics on Business Profits. Susan S. Burr 

Sampling for the 1947 Survey of Consumer Finances. Roe Goodman 

Contribution of Psychological Data to Economic Analysis. George Katona 

Statistical Methodology Index, No. 9. Oscar Krisbn Buros 


AMERICAN STATISTICAL ASSOCIATION 
1603 K Street, N. W., Washington 6, D. C. 











l.A.R.I. 73 


INDIAN AGRICULTURAL RESEARCH 
INSTITUTE LIBRARY, NEW DELHI. 


Data of Istua 

Date of Issue 

Data of Issue 


— 

— 








i 


■* 







• 


% 









I 












GIPNLK—H-40 l.A.R.I.—29-4- 5—15,000 




